131

Yes yes, I am aware that '\n' writes a newline in UNIX while for Windows there is the two character sequence: '\r\n'. All this is very nice in theory, but my question is why? Why the carriage return character is extra in Windows? If UNIX can do it in \n why does it take Windows two characters to do this?

I am reading David Beazley's Python book and he says:

For example, on Windows, writing the character '\n' actually outputs the two- character sequence '\r\n' (and when reading the file back, '\r\n' is translated back into a single '\n' character).

Why the extra effort?

I will be honest. I have known the difference for a long time but have never bothered to ask WHY. I hope that is answered today.

Thanks for your time.

sukhbir
  • 1,479
  • 2
  • 11
  • 9
  • 9
    It should also be noted that Windows isn't the only one that uses `\r\n`. It's also used by most text-based internet protocols (e.g. SMTP, HTTP, etc) for largely the same reason as Windows (ie history). – Dean Harding Dec 22 '10 at 13:19
  • 3
    Also, when in Java and using format strings (e.g. `System.out.printf()` or `String.format()`) make sure you use `%n` as your CRLF for OS compatibility purposes. `\n` is deprecated. – Gary Dec 22 '10 at 13:43
  • I've seen `\n\r` several times. (I think it was something from NetWare.) – user1686 Dec 22 '10 at 21:05
  • 1
    Related question at SO: [Historical reason behind different line ending at different platforms](http://stackoverflow.com/questions/419291/historical-reason-behind-different-line-ending-at-different-platforms) – Imran Jul 02 '11 at 08:37
  • For information, on Linux, telnet sends `\r\n` and netcat sends `\n`. – baptx Feb 19 '15 at 20:58
  • 2
    There are very few Windows programs that actually require CRLF. CRLF may be the default, but nearly everything will autodetect and use LF just fine. I have all my text editors on Windows configured to use LFs for all new files, and it's really not an issue. – Kevin Feb 15 '17 at 15:26

8 Answers8

166

Backward compatibility.

Windows is backward compatible with MS-DOS (aggressively so, even) and MS-DOS used the CR-LF convention because MS-DOS was compatible with CP/M-80 (somewhat by accident) which used the CR-LF convention because that was how you drove a printer (because printers were originally computer controlled typewriters).

Printers have a separate command to move the paper up one line to a new line, and a separate command for returning the carriage (where the paper was mounted) back to the left margin.

That's why. And, yes, it is an annoyance, but it is part of the package deal that allowed MS-DOS to win over CP/M, and Windows 95 to win over all the other GUI's on top of DOS, and Windows XP to take over from Windows 98.

(Note: Modern laser printers still have these commands because they too are backwards compatible with earlier printers - HP in particular do this well)

For those unfamiliar with typewriters, here is a video showing how typing was done: http://www.youtube.com/watch?v=LJvGiU_UyEQ. Notice that the paper is first moved up, and then the carriage is returned, even if it happens in a simple movement. The ding notified the typist that the end was near, and to prepare for it.

  • If Windows does write the newline character eventually, why do I need to explicitly take care of it? – sukhbir Dec 22 '10 at 12:17
  • +1 for answer. I suspected it had something to do with typewriters, but hadn't hear it said before. – WernerCD Dec 22 '10 at 18:25
  • 4
    How did Unix with its \n only used to work with those old days printer? I assume they did have Unix Consoles connected to typewriter type printers? – Senthil Kumaran Dec 23 '10 at 05:51
  • 7
    @Senthil, in Unix the newline character is converted by the end driver. It is just a different design decision. –  Dec 23 '10 at 16:59
  • 5
    @Senthil, to be precise, in Unix printers and terminals are abstracted in the operating system, and their description determines which byte sequences are generated for the device. CP/M had no such abstraction leaving it all to the program running - this is most likely because this was not needed by all programs so having it in the resident operating system would take away precious memory from programs not needing it. Remember that CP/M was designed for a 16 _Kilobyte_ system. –  Feb 26 '11 at 12:12
  • 3
    "So a major design feature of what is arguably the world's most advanced transportation system was originally determined by the width of a horse's ass." And so it is with software as well. http://www.astrodigital.org/space/stshorse.html – Ryan Michela Jul 01 '11 at 19:28
  • 2
    @Ryan, urban legend. Debunked at http://www.snopes.com/history/american/gauge.htm –  Jul 01 '11 at 19:40
  • @thorbjorn - The point still holds :) – Ryan Michela Jul 01 '11 at 19:43
  • @Ryan, not quite. It is an interesting story, but still wrong. –  Jul 01 '11 at 19:57
  • Printers were so much better back then; https://www.youtube.com/watch?v=lTxqQ3ALVcU (Not at actually _printing_ of course) – Basic Mar 30 '15 at 01:45
  • @user1249, actually, the point he made is *confirmed* by the article you linked. The only part that it 'debunks' is that there was somehow something "inevitable" about the chain of events in question taking place, and reaching the results which they ended up reaching. (e.g.: Had the South won the Civil War, their multi-gauge railroad system may have eventually adopted a *different* gauge as the standard, and that would have become the US standard. But they didn't, so the single-gauge northern rail system replaced the southern one. And it shared the gauge decided by the width of a horse.) – Theo Brinkman Jul 24 '19 at 14:37
31

As far as I'm aware this harks back to the days of typewriters.

\r is carriage return, which is what moves where you are typing on the page back to the left (or right if that is your culture)

\n is new line, which moves your paper up a line.

Doing only one of these on a typewriter would put you in the wrong place to start writing a new line of text.

When computers came about I guess some people kept the old model, but others realised that it wasn't necessary and encapsulated a full newline as one character.

Matt Ellen
  • 3,368
  • 4
  • 30
  • 37
  • 8
    So why does Windows *still* stick to it? – sukhbir Dec 22 '10 at 11:45
  • 10
    Backward compatibility. Imagine how many text documents would break if they changed now – Matt Ellen Dec 22 '10 at 11:47
  • 1
    @Matt - think teletype, automatic electronic typewriters. You needed both commands (for that is what they are) - so its not so much a matter of computers "keeping" the old model so much as *requiring* the old model (albeit some decades ago). – Murph Dec 22 '10 at 11:50
  • 4
    Strictly speaking, the "oddball" here is the unixoid 'use newline only', initially done (I believe) to keep the number of stored characters down (the translation to CR LF is done in the terminal driver, it's the 'onlcr' flag that controls it for output. – Vatine Dec 22 '10 at 11:50
  • 1
    UNIX is older than Windows right? Why did Windows choose \r\n then when Unix did not? – sukhbir Dec 22 '10 at 11:53
  • 3
    Windows had a Predecessor named DOS, that had the same line-ending. Windows kept compatibility. DOS had predecessors itself, namely CP/M. That used also CRLF. DOS kept compatibility. The development of CP/M was influenced by DECs TOPS. And you can guess, which lineending they used. :-) Compatability explains much. – Mnementh Dec 22 '10 at 12:03
  • Well that pretty much answers it. – sukhbir Dec 22 '10 at 12:09
  • @fredrick: Most monitors these days are LCD, but everything still works on the vertical and horizontal retrace intervals that CRTs needed. It's just not worth it to change: the amount of stuff based on and around it is staggering. – Satanicpuppy Dec 22 '10 at 16:10
  • 7
    OK, but why does Notepad *still* not recognize "\n" line endings? – dan04 Feb 20 '11 at 10:06
  • @dan04 Opening/displaying a files using \n is easy, but what happens when you save it? Either you change all \n to \r\n and break the the Principle of Least Astonishment, or you attempt to remember which line endings this particular file uses (what if it's mixed?). There isn't a particularly pretty answer without making notepad more complex than it is. That's what WordPad is for (and it does correctly handle \n). It's just unfortunate that WordPad has other usability issues... – Basic Mar 30 '15 at 01:55
  • 3
    @dan04 it now recognizes them, see https://blogs.msdn.microsoft.com/commandline/2018/05/08/extended-eol-in-notepad/ – Jason May 18 '18 at 23:48
15

I don't know if this is common knowledge, but it should be noted that CR is still understood by modern terminal emulators:

$ printf "hey world\rsup\n"
sup world

It's handy for progress indicators, e.g.

for i in {1..100}
do
    printf "\rLoading... %d%%" $i
    sleep 0.01
done
echo
Daniel Lubarov
  • 1,226
  • 8
  • 12
  • 1
    On the old IBM line printers (e.g., the 1403), the convention was to treat the first character of the line buffer as a carriage control character. Blank meant to advance one line and print. Plus meant to omit spacing and was used, e.g., to underline. A zero meant to double-space and a minus to triple-space. A '1' spaced to the top of the next page, and other digits advanced to user-defined vertical positions (used to fill in pre-printed forms). – George Jan 23 '19 at 21:10
8

History of the Newline Character (Wikipedia):

ASCII was developed simultaneously by the ISO and the ASA, the predecessor organization to ANSI. During the period of 1963–1968, the ISO draft standards supported the use of either CR+LF or LF alone as a newline, while the ASA drafts supported only CR+LF.

The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems, text was often routinely composed to be compatible with these printers, since the concept of device drivers hiding such hardware details from the application was not yet well developed; applications had to talk directly to the teletype machine and follow its conventions.

The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR first. In fact, it was often necessary to send extra characters (extraneous CRs or NULs, which are ignored) to give the print head time to move to the left margin.

Even after teletypes were replaced by computer terminals with higher baud rates, many operating systems still supported automatic sending of these fill characters, for compatibility with cheaper terminals that required multiple character times to scroll the display.

MS-DOS (1981) adopted CP/M's CR+LF; CP/M's use of CR+LF made sense for using computer terminals via serial lines. This convention was inherited by Microsoft's later Windows operating system.

The Multics operating system began development in 1964 and used LF alone as its newline. Unix followed the Multics practice, and later systems followed Unix.

Craige
  • 3,791
  • 21
  • 30
  • On the old IBM 2741 printer-keyboard terminal, the printer component was an IBM Selectric bouncing type ball typewriter. Changing to uppercase caused the ball to rotate, taking extra time. In the EBCDIC character code, uppercase characters had a 1-bit in position 6. So, an EBCDIC blank (0x40) was uppercase! If you were printing a long doc (e.g., a thesis), you could materially speed up output by translating blanks between lowercase words to NULs, or lowercase blanks (they used a different character, IL if memory serves, to introduce necessary delays, e.g., when returning or tabbing). – George Jan 23 '19 at 22:14
7

Historically, line feed meant that the platen - the roller on which you type - rotated one line, causing text to appear on the next line... but in the next column.

Carriage return meant "return the bit with which you type to the beginning of the line".

Windows uses CR+LF because MS-DOS did, because CP/M did, because it made sense for serial lines.

Unix copied its \n convention because Multics did.

I suspect if you dig far enough back, you'll find a political disagreement between implementors!

(You left out the extra fun bit, where Mac convention is (or used to be) to just use CR to separate lines. And now Unicode also has its own line separator, U+2028!)

Frank Shearar
  • 16,643
  • 7
  • 48
  • 84
7

What is it with people asking "why can Unix do \n and not Windows"? It's such a strange question.

  1. The OS has almost nothing to do with it. It's more a matter of how apps, libraries, protocols and file formats deal with things. Other than where the OS reads/writes text-based configuration or command line commands, it makes no sense to fault the OS.
  2. Most Windows apps can read both \n and \r\n just fine. They also output \r\n so that everyone's happy. A program doesn't simply "do" either \n or \r\n -- it accepts one, the other, or both, and outputs one, the other, or both.
  3. As a programmer this should really almost never bother you. Practically every language/platform has facilities to write the correct end-line and read most robustly. The only time I've had to deal with the problem was when I wrote an HTTP server -- and it was because a certain browser (hint: the next most popular browser after IE) was doing \n instead of the correct \r\n.
  4. A much more pertinent question is, why do so many modern Unix apps output only \n fully knowing that there are some protocols and programs that don't like it?
Rei Miyasaka
  • 4,541
  • 1
  • 32
  • 36
  • 4
    Another pertinent question: since many protocols were developed primarily on Unix systems, why didn't they use '\n'? – David Thornley Dec 22 '10 at 15:11
  • 2
    @DavidThornley Because \r\n is more likely to work cross-platform (\r for older macs, \r\n for windows and \n for *nix). – Basic Mar 30 '15 at 01:59
4

The reason the conventions hold on their various systems (\n on unix type systems, \r\n on Windows, etc) is that once you've picked a convention you CAN'T change it without breaking a bunch of people's files. And that's generally frowned upon.

Unix-type systems were developed (very early days) using various models of teletype, and at some point someone decided the equipment should carriage return when it did a line feed.

Windows came from DOS, so for Windows the question really is: Why did DOS use this cr/lf sequence? I'm guessing it has something to do with CP/M, where DOS has some of it's roots. Again, specific models of teletype may have played a role.

Michael Kohne
  • 10,038
  • 1
  • 36
  • 45
  • Hmm interesting. – sukhbir Dec 22 '10 at 12:00
  • 1
    Why can't Windows *handle* lines ending with `\n`, but continue to use `\r\n` for now? If they did that starting with Windows XP, they could now start saving files with `\n` instead of `\r\n`. – DisgruntledGoat Dec 22 '10 at 13:13
  • 3
    Windows has nothing to do with it. It's the apps' decision, and most apps will read both '\n' and '\r\n', and write '\r\n' -- so everyone's happy. – Rei Miyasaka Dec 22 '10 at 14:34
4

Here is an answer from the best source - Microsoft. Why is the line terminator CR+LF?

This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you'll see that they all specify CR+LF as the line termination sequence. So the the real question is not "Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?" but rather "Why did other people choose to differ from these standards documents and use some other line terminator?"

Unix adopted plain LF as the line termination sequence. If you look at the stty options, you'll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

each
    line
        begins

where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

The unix ancestry of the C language carried this convention into the C language standard, which requires only "\n" (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

The C language also introduced the term "newline" to express the concept of "generic line terminator". I'm told that the ASCII committee changed the name of character 0x0A to "newline" around 1996, so the confusion level has been raised even higher.

Ondra Žižka
  • 267
  • 3
  • 6