3

A follow-up to Difference between '\n' and '\r\n'.

It's been few decades since the schism was introduced. Nowadays, when documents are being exchanged over the internet, typically with no prior knowledge of the client's preference of line endings, the clients have to deal with both \n, \r\n.

To me it seems that it's safe to use \n only. Software produced by Microsoft can process both. The major plain text editors can too. Browsers, IDEs, file managers, office suites, all these can do it.

Is there any point in writing software to use CRLF or is it practically ok to just unify at "\n"? Are there known problems with "\n" in any major modern Windows software?

EDIT: The issue is not with the software itself. Indeed the software can use some kind of NL constant which resolves at runtime. However, the generated files are about to be transferred, and hardly converted on each occasion.

Imagine a company where the originating machine of a content/document can be any platform, and the consuming too. And the way of transferring the documents can be any (mail, shared drive, download,...) In such scenario, there is no way to prevent content using \n appearing on Windows, and vice versa. Hence the question.

Ondra Žižka
  • 267
  • 3
  • 6
  • There may be problems with tools that expect Windows style endings, e.g. from Excel-exported `.csv` tabular data containing files. Of course, that can be dealt with when it happens and the issue also exists the other way around. – Mörre Sep 30 '17 at 06:35
  • 2
    Recommended reading about backwards compatibility: https://www.joelonsoftware.com/2008/03/17/martian-headsets/ – Doc Brown Sep 30 '17 at 07:19
  • There is way much more text processing software out there than a single person including me and you know. ' \r\n' needs be understood by any of those under Windows, opposed to '\n'. So as long as you want to keep your programs backwards compatible, better play it safe and use ' \r\n'. And what do you lose? For example, in C#, I often use functions like "WriteLine" or "AppendLine" which use the system's correct line breaks automatically, I seldom use a thought about this. – Doc Brown Oct 01 '17 at 06:59
  • Specifications written for HTTP and other web standards require \r\n. – Rob Oct 09 '17 at 02:03
  • @Doc Brown that doesn't work if you exchange over the internet with "no prior knowledge of the client system" – Austin_Anderson Oct 09 '17 at 13:06
  • 1
    @Austin_Anderson: in the original question, before the last edit, the OP was not specificially talking about software for data exchange with other, unknown platforms. That puts it in a different light. – Doc Brown Oct 09 '17 at 15:02
  • If the software is intended to accept and store files on behalf of users who may throw anything (any kind of files) to it, and the user expects the system to somehow handle it correctly, and the user isn't sophisticated to know about the issue of newlines at all, then I guess the [Git handling of newlines](https://help.github.com/articles/dealing-with-line-endings/) may be a good way to handle the issue. – rwong Oct 09 '17 at 15:38
  • Git handling is nice. The thing is, my software is producing the static files, and my concern is whether it's okay to produce files with LF only, or whether the Windows issues is so bad that it would need a separate output with CRLF. But then again, I wouldn't have control over what files would be distributed where. It seems that the best would be if Microsoft changed `notepad.exe` to handle `\n` and the console (cmd.exe?). – Ondra Žižka Oct 09 '17 at 18:21
  • The simple solution would be to use `\r\n` everywhere since all operating systems can work with that. Of course you might argue that you are wasting a byte for every line then, but on the other hand indenting code with two or four spaces is also a pretty common thing to do. – Ultimately, the answer is always: It depends. There’s still a lot of critical software (or protocols) that require one or the other. – poke Oct 09 '17 at 18:41

4 Answers4

1

Windows Notepad (notepad.exe) doesn't interpret a standalone \n as a new line. It's not necessarily "modern" but pretty much "mainstream".

If you're writing text files, every day user should be able to edit, don't focus on \n only and instead write your program in a way to accept all three styles (since some older programs might even use \r only).

Andy
  • 2,003
  • 16
  • 22
Mario
  • 1,489
  • 2
  • 11
  • 13
  • Well, ok, it's installed in all Windows, but I think even Microsoft doesn't mean that as a real text editing app. Originally it has been a demo app for WinAPI and nowadays it's rather a relic. – Ondra Žižka Sep 30 '17 at 23:44
  • 2
    @OndraŽižka Well, it's no full weird processor, yet it's still the default for editing plain text for most users. – Mario Oct 01 '17 at 06:18
  • This all boils down to: do you only want to write software with the installation requirement *"needs a text editor with unix-compatible line break interpretation to be preinstalled"*? Or do you prefer to write software with the installation requirements *"needs nothing but a naked Windows installation"*? – Doc Brown Oct 01 '17 at 07:03
  • @DocBrown, *naked Windows installation* has WordPad, which is capable of parsing `\n`. `notepad.exe` has several other limits, like max. size 64 kB, if I remember correctly. So I'm rather asking about the modern programs, since as a non-windows user I can be missing some important one which insists on CRLF. – Ondra Žižka Oct 02 '17 at 04:13
  • 2
    @OndraŽižka The Windows Editor is still the default associated application for many file extensions, even on Windows 10. Also the 64 KB limit is long gone (since XP or maybe even longer). Besides that, just use `\n` in your program. If you're using text mode for IO, it will essentially be `\r\n` on Windows. – Mario Oct 02 '17 at 05:52
  • @OndraŽižka If it were a relic, it would have been removed by now. Its not, and its by default the application to use for txt files. I suspect most people don't change it either. – Andy Oct 09 '17 at 21:25
1

As far as Windows and C# is involved you can always use the Environment.Newline

to determine the default new line character of the system the program is ran on.

also, you can use text.Replace("\n","\r\n") to switch to windows return.

There are still compatibility issues when managing files and especially office related ones, some arcane COM apis are also newline sensitive.

1

This question is really about a software application's "customer base".

To answer your question, you have to know whether your customers might be inconvenienced if your application generates output text files which are only \n and don't provide an option of outputting \r\n. The best way to find out is to ask your real customers.

From a programmer's point of view, adding an option for choosing the newline in output text files is a relative small task. Alternatively, one can automatically choose \n and \r\n based on the platform. Also, most text line handling library functions already handle both \n and \r\n, without programmer's effort.

If you are writing a library, you can either return the data in strings, which completely sidestep the question of newlines, or to use the text file handling facility that comes with the programming language.

However, if you are distributing archives (e.g. ZIP files) containing text-based files, it would indeed seem redundant to provide two sets of archives: one ZIP file where all text files have \r\n newlines, another where all newlines are \n. Typically, this problem is solved by combining it with another network-effect problem, the favorite file compression format for each platform:

  • Provide a ZIP with newlines \r\n
  • Provide a TGZ with newlines \n
rwong
  • 16,695
  • 3
  • 33
  • 81
0

Preamble: Text is binary with conventions.

"To me it seems that it's safe to use \n only."

"Imagine a company where the originating machine of a content/document can be any platform, and the consuming too. And the way of transferring the documents can be any (mail, shared drive, download,...)"

There are 2 points that we have to address here. If you're serving the file and if you're consuming the file.

Serving the text file:

Stick to a specific format. Define your encoding, your newline policy and keep it consistent along all the files you generate. Understand the specifics of each encoding you decided too use (for example, UTF-8 usually requires you to put a 3-byte BOM at the beginning of each file - some systems are not ready to work with this).

Read about the Turkey Test and why it's important.

Also, as you're a software developer, and not a common user, remember: Text is a bunch of bytes with some specific sprinkled conventions over the top, so treat it as BINARY data and transfer it accordingly on FTP, SFTP, HTTP responses, file writing, etc...

Consuming a text file:

Unless you have the proper specification of the file format, you're going to have a bad time.

But there's nothing stopping you from doing some kind of heuristics about newlines, based on the fact that the most common formats are \r\n, \n and \r, based on Newline Representations on Wikipedia.

What I usually do when I have to consume a text file is searching for \r\n, since it's the most common representation (greater number of platforms). Then skip to using just \n because it's the second most important. Lastly skip to using just \r.

Remember, text is binary data with conventions. Discovering the conventions is what makes it hard.

Final rant:

Text is hard. Handling text the correct way is harder.

There is no such thing as a text file. Also Plain Text is a lie. A final user has the luxury of complaining about TXT files, but we as developers do not.

The sheer number of questions about how to proper detecting encoding on a text file on StackOverflow is a hint.

Just handling common Character Encodings it's difficult due to the sheer amount of different encodings available. Just on that Wikipedia page there are over 60 different to think about.

Handling newlines is just another side of how to proper handle text. Which is special if you have to interface with older devices (such as an Atari-8 that uses 0x9B as newline marker).

Text is binary data with conventions on top.

Important notes:

Machado
  • 4,090
  • 3
  • 25
  • 37
  • UTF-8 does _not_ require a BOM. The WP article you linked to even says so explicitly, under the "UTF-8" headline. – Marc Schütz Oct 10 '17 at 10:54
  • While it's not mandatory, the fact that some major editors like Notepad++ automatically put a BOM unless you explicitly ask it not to, along with Default behavior of UTF-8 encoding on .NET doing the same, makes it almost de-facto standard on UTF-8 files from my point of view. It's not mandatory, and that's the reason I wrote "usually requires", and not "100% mandatory all the time". – Machado Oct 10 '17 at 13:18