6

In the C programming language (and many subsequent languages that either directly interfaced with or built a facsimile of the C's Standard IO functions), there exists a function called ungetc: int ungetc(int char, FILE *stream);. It 'puts back' a char to the front of the stream that is being read. This putting back is only virtual: the original input stream is not altered, only the result of subsequent 'getc' calls will first read the 'ungetc' values before continuing with the real next values in the stream.

Why does this function exist? What are examples of use cases that can only be handled by using 'ungetc'?

Dan Pichelman
  • 13,773
  • 8
  • 42
  • 73
Qqwy
  • 4,709
  • 4
  • 31
  • 45
  • I'm not sure if there's anything in programming that *can **only** be handled* in a certain way. There are often multiple solutions to problems. – null Sep 04 '16 at 21:56
  • 1
    Must there exist use cases that can only be handled by a particular function for that function to exist? Or is is sufficient that the function merely be useful? – Robert Harvey Sep 04 '16 at 22:24

1 Answers1

16

The short answer is that ungetc allows you to peek at the next character without consuming it.

Let's say you're reading a packetized data format. It contains, among other things, a frame sync pattern. Frame sync patterns allow you to align data by marking the beginning of a data frame in an otherwise unsynchronized data stream.

To facilitate the discussion, here's a data definition:

[sync pattern] [packet length] [--------------data--------------] [checksum]

|---0xEB25---| |-- 16 bits --| |-- packet length minus 64 bits--| |32 bits |

The sync pattern EB25 is chosen for a number of reasons. Its bit pattern is resistant to false positives, and it's unique enough to serve as a file type "magic number."

The checksum is there to detect transmission errors and to validate the sync pattern (since EB25 has a small chance of actually being valid data). When combined with an accurate packet length, the combination of sync pattern, packet length and checksum virtually guarantees that you have identified a valid data packet.

Now imagine going through this exercise without the ability to back up to a previous point in the data stream. To find the next packet, you must scan bytes until you identify a sync pattern of EB25, taking into account that the bytes are reversed because the spec is based on Little Endian. Once you have identified the sync pattern, you must read the packet length, and then the remainder of the packet, and compute a checksum. If the checksum check fails, you must start over again from the byte following the failed sync pattern. To do that, you must back up to the start of the sync pattern + 4 bytes, and begin scanning again.

So far, I haven't described anything that couldn't also be accomplished by buffering the input stream. But what if the sync pattern is not guaranteed to align on a byte boundary? In that situation, the first bit of the sync pattern could occur in the middle of a byte. So to get the first 8 bits, you would have to read two bytes, not just one. Under these conditions, wouldn't it be useful to scrub backwards one byte if no consecutive 8 bits were an E (without standing up a buffered reader)?

This isn't just an idle hypothetical. The IRIG 106 Chapter 10 specification works exactly this way, although I've simplified the story somewhat for this demonstration.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • 1
    Nice example. Another example that comes to mind is writing a lexical analyzer in which, again, you need to peek the next byte of input without consuming it. – Giorgio Sep 05 '16 at 05:37
  • Interesting! For a file, one could of course read a byte and then rewind a step, but I presume that this is not possible for non-file streams? – Qqwy Sep 05 '16 at 09:33
  • Minor detail, but there appear to only be 64 bits of metadata: is "packet length minus **80** bits" correct for the data length? – jscs Sep 05 '16 at 18:23
  • @JoshCaswell: Right you are. The Chapter 10 specification is a bit more complicated than I've described here; it's actually channels within a packet stream. Each channel can contain an independent PCM packet stream, ARINC data, video, etc. Some channels actually have their own sync patterns that are longer than the primary. – Robert Harvey Sep 05 '16 at 19:38