27

I wrote an open source library that parses structured data but intentionally left out carriage-return detection because I don't see the point. It adds additional complexity and overhead for little/no benefit.

To my surprise, a user submitted a bug where the parser wasn't working and I discovered the cause of the issue was that the data used CR line endings as opposed to LF or CRLF.

Hasn't OSX been using LF style line-endings since switching over to a unix-based platform?

I know there are applications like Notepad++ where line endings can be changed to use CR explicitly but I don't see why anybody would want to.

Is it safe to exclude support for the statistically insignificant percentage of users who decide (for whatever reason) to the old Mac OS style line-endings?

Update:

To clarify, supporting Windows line endings (ie CRLF) doesn't require CR token recognition. For efficiency purposes the lexer matches on a per-char basis. By silently ignoring CR chars, the CRLF token simplifies to LF. As such, the CRLF token itself could be considered an anachronism all its own but that's not what this question is about.

The last OS that provided system-wide support for CR style line endings was Mac OS 9. Ironically, the only application that still uses it as the default in OSX is Microsoft Excel.

Evan Plaice
  • 5,725
  • 2
  • 24
  • 34
  • 21
    "It adds additional complexity and overhead": I think the additional complexity and overhead are really small. – Giorgio Dec 13 '12 at 07:10
  • 1
    Just a month ago I encountered CR line endings in a file created with a modern Mac. I don't know which software was used for modifying it. – msell Dec 13 '12 at 07:14
  • 1
    Excel for the Mac defaults to CR for .csv and other text based formats, but it can do CRLF. – FigBug Dec 13 '12 at 07:36
  • @Giorgio Probably not but my motivation to provide support for %.001 of the user base makes it not worth my time. In business terms the opportunity cost is too high. In simple terms, I'd rather find reasons to justify my laziness than waste time adding edge-case support for a dead platform. – Evan Plaice Dec 13 '12 at 07:40
  • @FigBug Really... I guess that's the culprit because it's a csv parser. At least I can inform the user to change the Excel preferences to use LF by default. I would put on my tinfoil had and go all out conspiracy theorist but I'm too tired for shenanigans right now. Manes you wonder, is that a case of MS being intentionally or unintentionally incompetent. – Evan Plaice Dec 13 '12 at 07:46
  • 11
    @EvanPlaice wouldn't it give less headaches and more time to be lazy to just plug in the CR support you intentianlly left out? – Pieter B Dec 13 '12 at 07:50
  • For formats, the save as dialog in Excel 2011, has 'Common Formats' and 'Specialty Formats' Common formats has .xls .xltx .xlt .csv (cr) .htm and .pdf. Specialty formats has a whole bunch of stuff including Windows Comma Separated and MS-DOS Comma Separated. I'm not sure what the difference is. – FigBug Dec 13 '12 at 08:10
  • 1
    @Evan: [Never attribute to malice that which is adequately explained by stupidity.](https://en.wikipedia.org/wiki/Hanlon's_razor) – Joachim Sauer Dec 13 '12 at 08:32
  • @JoachimSauer You forgot the last part, "...but don't rule out malice." Great quote, thank you for that. I actually wish MS had a public issue tracker for MS Office so I could report a bug. Their lack of a public bug tracker for their flagship product biases my opinion toward stupidity. – Evan Plaice Dec 13 '12 at 08:37
  • @PieterB Probably but the question is more about principle than rationale. Does the developer population collectively agree to put CR to bed? Can somebody come up with a good reason to support it besides cultural inertia? – Evan Plaice Dec 13 '12 at 08:48
  • 11
    "In business terms the opportunity cost is too high. In simple terms, I'd rather find reasons to justify my laziness than waste time adding edge-case support for a dead platform.": In business terms it would have taken less time to implement the support for CR than to post a question here to investigate the relevance of this feature. – Giorgio Dec 13 '12 at 09:03
  • 4
    @EvanPlaice cultural inertia is perfectly good reason. – Pieter B Dec 13 '12 at 09:06
  • @Giorgio Technically you're incorrect, the plugin consists of 3 parsers * 3 more forms (ie CR, CRLF, LFCR) and tests to confirm that everything will continue to work in the future. Plus, this is a matter of principle not technical difficulty. – Evan Plaice Dec 13 '12 at 09:07
  • 1
    @Evan Plaice: I was following the business principle that you should go for the fastest solution that will work and can be sold. – Giorgio Dec 13 '12 at 09:09
  • 2
    "Technically you're incorrect, the plugin consists of 3 parsers * 3 more forms": Shouldn't the handling of newline tokens be done in one module only (the lexical analyzer)? – Giorgio Dec 13 '12 at 09:15
  • @Giorgio Only from the perspective of a seller. The perspective of the owner also needs to consider maintenance, and depreciation. Everything depreciates, even code. The cost to maintain obsolete code is usually greater than it's value. Businesses do it by neglecting to trim excess fixed assets even after they have been depreciated to worth nothing. Developers do it by hoarding code that supports obsolete platforms. And the three parsers are, entry parser, single-pass parser, and csv-specific line splitter. There are good reasons it's done that way. When was the last time you used Dr. Watson? – Evan Plaice Dec 13 '12 at 09:31
  • 2
    @Evan Plaice: Of course you are correct that one should not support obsolete formats (who would object to that?). In your specific case (supporting CR line terminator in your lexer): (1) you were not 100% sure whether it is obsolete or not (some users still expect CR to be recognized), (2) trivial implementation (provided you have a centralized lexical analyzer), (3) no maintenance cost (unless CR is going to have a specialized, conflicting meaning in the feature, in which case you should make CR explicitly invalid as a newline marker, but again, trivial implementation / very low cost). – Giorgio Dec 13 '12 at 09:52
  • 5
    @EvanPlaice: Writing this question already cost you more of time than simply shoveling in support for `CR` newlines into your codebase. (...and if you firmly believe this isn't the case, your parser's design must be pretty hectic) – ZJR Dec 13 '12 at 10:54
  • 4
    @EvanPlaice - Actually, well-designed, well-written and well-maintained software is one thing that does NOT depreciate. It just keeps working. – Stephen C Dec 13 '12 at 11:24
  • 1
    Will the files supplied to your plugin only ever originate from OSX? what if someone creates a file in windows where its more normal to have CRLF? – Mauro Dec 13 '12 at 12:45
  • 1
    Is python's os.linesep or C++'s std::endl really that complex? – MrFox Dec 13 '12 at 16:58
  • @StephenC That's a myth. Most software has a shelf life and planning for obsolescence is important to ensuring a slim maintainable codebase in the future. Unless you're working under the waterfall (ie RUP) development model, then ::applause:: – Evan Plaice Dec 13 '12 at 22:10
  • @Evan: Planning for obsolescence seems pretty much the *opposite* of "ensuring a slim maintainable codebase". If the software's not going to be around in 10 years, who gives a damn about maintainability? Plan for *maintenance* if you want maintainability. Unfortunately for you, though, if you plan well for maintenance, you might never achieve obsolescence. :P – cHao Dec 14 '12 at 15:43
  • Figured I'd throw this in: Windows is CRLF structured but supports plain LF also. However, Notepad (even on Windows 8) fails to render documents correctly if it isn't CRLF. – Cole Tobin Dec 14 '12 at 23:55
  • @ColeJohnson Feature, or bug? You decide. Isn't notepad only intended for note taking. Whereas, WordPad is a fully featured plaintext/richtext editor? They're the ones responsible for this bug submission in the first place as MS Excel on Mac outputs CSV data using CR line-breaks by default. – Evan Plaice Dec 15 '12 at 02:21
  • @cHao No, of a software has planned obsolescence, room will be made to add new features as old ones are retired. That's the distinguishing quality that makes it maintainable. Unfortunately, almost no software projects are designed with planned obsolescence in mind so most become bloated and die. – Evan Plaice Dec 15 '12 at 02:27
  • 1
    @EvanPlaice: Almost no software projects are designed for obsolescence because it's rather idiotic to plan on throwing away production-quality effort. They are starting to be more commonly designed for maintainability, which as a side effect makes obsolescence an option...but again, you're pissing all over real users' needs, and it's going to bite you later on. – cHao Dec 15 '12 at 02:48
  • @cHao Did you know that the Linux Kernel just dropped i386 support? Would you call that a waste? I'd call it a good design decision. It drops a dying branch in favor of better support for x64 and ARM processors. Why would you claim I'm pissing on my user's needs? my project's user base is very satisfied with the state of development. I even have a new contributor implementing some features that I haven't had the time to do myself. The user who submitted this is one out of thousands and he fixed the issue by changing a setting in Excel. Problem averted, everybody's happy. – Evan Plaice Dec 15 '12 at 04:24
  • @EvanPlaice: That isn't *fixing* the issue. That's *working around* the issue. In decent code it'd take like 5 lines to actually *fix*, and would not need further maintenance of its own. We're not talking about an overhaul here. We're talking about a change that, unless your code is that horrid, would literally take less time and effort to implement and maintain *over the app's entire support lifetime* than you have spent fighting it in the name of laziness. – cHao Dec 15 '12 at 05:44
  • As far as Linux and i386, i'm not sure i like it, but i see the point. Old old 32-bit code could easily cause maintenance problems and bring constraints that hinder future development. Adding CR support causes none of that, though; if done right, you could add it in and literally never have to edit it again. – cHao Dec 15 '12 at 15:16
  • "*There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.*" No, the CR isn't obsolete. There's this little platform called "Windows" that still uses it. – Ross Patterson Jan 09 '13 at 00:28
  • @RossPatterson Whatever you say Shakespeare. But there's an easy way to ignore CR chars **and** still support Windows. I've updated the question to outline how it's handled. – Evan Plaice Jan 09 '13 at 00:52
  • 1
    "Of course you are correct that one should not support obsolete formats (who would object to that?)". I would. If the customer uses the format, even if it is obsolete, you should support it because if you don't the product will not meet customer requirements! – jwenting Jan 09 '13 at 07:40

7 Answers7

37

There is a good practice where you are "liberal in what you accept, and conservative in what you send".

In other words, if there is a chance (however small it will be) that someone will give you a cr line ending (and expect it to work correctly) , you'll need to support it.

TBH, I can't see how adding CR support would take all that long.

When you see a cr in the lexer peek the next character and if it is a nl, swallow the newline and emit a newline token, if the next character isn't a nl just emit a newline token and continue.

ratchet freak
  • 25,706
  • 2
  • 62
  • 97
  • It's true that the implementation wouldn't be difficult. If I was struggling I would have asked for help on Stack Overflow. This is more about 'why' not 'how'. Good answer BTW, we'll see if anybody else chimes in. – Evan Plaice Dec 13 '12 at 09:02
  • 24
    @ZJR: postels law is dangerous: be very careful when employing the robustness principle, because it frequently backfires. The html parsing mess we are still in can be attributed to that mindset. When a program accepts malformed input, its behavior as a result soon becomes *expected* and depended upon behavior, and any changes later that treat the malformed input differently, or not at all, while being technically correct, is often considered defective. – whatsisname Dec 13 '12 at 16:18
  • 4
    @whatsisname: I disagree. I think production quality software should be robust. Development toolchains should however strongly discourage relying on such robustness and only produce valid output. The mess html is in is caused by almost two decades of poor tooling, not by the lenience of browsers. – back2dos Dec 13 '12 at 16:50
  • 2
    @back2dos: _ _ so? the poor tooling is caused by the lenience of browsers. – amara Dec 13 '12 at 19:31
  • 4
    the poor tooling is the result of the browser war – ratchet freak Dec 13 '12 at 19:44
  • Browsers are as much a part of the development toolchain than the authoring tools themselves, and have been for years. Lenient browsers and poor development tooling aren't cause and effect, they're the same thing! – detly Jan 09 '13 at 03:43
  • @back2dos: Obviously, all production quality software should be robust (within certain parameters). However, it is more effective to validate software when the state space is minimized. Therefor one should try to minimize the amount of possible input that is allowed. – Dibbeke Jan 09 '13 at 14:25
  • 2
    @Dibbeke: Handling malformed input merely maps a bigger input space to the existent state space and thus has no effect on it - provided your software has a decent separation of concerns. – back2dos Jan 09 '13 at 16:02
  • @back2dos: In large systems, complexity should be avoided in the middle of the network. Mapping, which resides closer to the network center, isn't always a trivial job. – Dibbeke Jan 17 '13 at 22:44
  • @Dibbeke: I am talking about mapping a larger input space to a smaller one, that's it. If input processing happens anywhere close to the middle of your network, then your system has a problem. Ideally, front-ends should be easily plugable onto your architecture, because internal state and external representation should be decoupled. – back2dos Jan 18 '13 at 13:24
  • @back2dos I must admit 'closer to the network center' was very vague in terms of HMI. I was thinking in terms of the information system boundaries and specifically machine-machine interaction. The information a user provides is also part of a (knowledge) network, which is a key insight to understanding the network relationship I was talking about. By using the word 'center' I implied there is only one center, so 'nexus' would have been a better term. – Dibbeke Jan 21 '13 at 11:04
21

No. CR is not obsolete (defined as "no longer produced or used"). You yourself have provided evidence of that. It is perhaps uncommon, but not obsolete.

As for "is it safe to exclude support" for CR? As you say, it's not a matter of losing sales, and you can't support every weird character combination and file format in the world, and only you know your software and user base. So I would say that it would be safe to exclude it if you're convinced that the support burden of not adding it (as mouviciel explains) does not outweigh the time burden of adding it. But without knowing a lot more about the product and user base I'm not sure how to be any more specific.

Arjailer
  • 801
  • 7
  • 12
  • 13
    +1 - IMO, the OP is trying to label CR as "obsolete" so that he has an excuse for not supporting it. – Stephen C Dec 13 '12 at 11:20
  • 1
    @StephenC I'm not trying to hide that fact. It's not like I really *need* an excuse, I'm the author and thus have final say. The point is, it raises an interesting question. – Evan Plaice Dec 13 '12 at 22:18
18

About laziness: you have to balance:

  • effort in changing code so that CR is safely handled (and then forget about it).

  • effort in explaining to users why the files they were happy with for decades suddenly crash your app, in finding workarounds that they can use without compromising your sales and in asking for arguments and anwsering to comments right here.

It is up to you to decide which path is the laziest.

mouviciel
  • 15,473
  • 1
  • 37
  • 64
  • Good points, support definitely comes with a time cost. For this particular case 'sales' isn't an issue (ie it's open source) but it's worthwhile to consider the bigger picture. Likewise, I could also throw an exception in the code when a CR is encountered indicating an invalid/unsupported character. – Evan Plaice Dec 13 '12 at 09:41
  • 7
    @Evan: Of course it's open source. If it weren't, your boss would have told you "I don't give a shit that 'nobody' uses CR anymore! Customers are complaining. FIX IT!" :P This is the big thing about OSS that pisses me off: the lack of attention to the *real cases* that users have complained about. Whether you think it's obsolete or not, *someone* is still using it. – cHao Dec 13 '12 at 13:25
  • 1
    because it's open source, you can write an open letter to all users that you will accept any patch to fix it. – rwong Dec 13 '12 at 19:12
  • @cHao Those dastardly OSS developers! Investing all that time and effort to produce code that is free of cost and free for modification without thinking of you first. Please... your sense of entitlement is staggering. – Evan Plaice Dec 13 '12 at 22:24
  • @rwong Exactamundo! That's probably what I'll end up doing. – Evan Plaice Dec 13 '12 at 22:27
  • @EvanPlaice: My "sense of entitlement" to software that doesn't blow up is staggering. Gotcha. (See, i can twist words too.) **Yes, i am fully entitled to complain about brokenness.** I know a lot of those guys actually care. But if they don't, guess what? Most users can't and won't fix software. They may file a bug report, which some schmuck blithely closes as "your real-life issue is not a real issue". They may call your app a steaming pile of crap in every public forum they can think of. Hell, maybe they'll do both. In either case, they might also just go find an app that works. – cHao Dec 13 '12 at 22:41
  • @cHao You should never feel entitled to complain about OSS software. Complaining is worth less then nothing because it's counter-productive (ie a waste of time). OSS is a meritocracy where attention is the most valuable form of currency. If you want some, earn it. Whether that be through a well-thought-out bug report, a code contribution, etc. I think OSS devs are perfectly justified when they say, "of you have nothing of value to offer, do everybody a favor and go away." Try creating an OSS project sometime and tell me how much fun it is when people feel entitled to waste *your* time. – Evan Plaice Dec 13 '12 at 22:57
  • (cont) And yes, OSS devs really do care. But it's a slippery slope. Wasting time on bad (ie complaining) users can eat up significant amounts of time where that time would be better spent developing fixes and/or garnering support from good (ie contributing) users. There's a very good reason OSS projects are hostile to certain users. Development time is a finite resource that needs to be protected. I'm not trying to attack you personally, just pointing out that OSS devs usually handle things in a certain way for good reasons; even if you don't like the outcome. – Evan Plaice Dec 13 '12 at 23:05
  • 1
    @EvanPlaice: That "attention is...currency" thing works both ways. If you want people to use your app, it has to work, and it has to solve their problem. A broken app isn't immune to criticism just because it's free. I'm not saying you need to do *everything* users ask for; you *should* dismiss outrageous requests. But if you don't solve real users' problems, you end up losing users. – cHao Dec 13 '12 at 23:54
  • 1
    @EvanPlaice: And by the way, when i mean "complain", i mean "file a bug report outlining what's broken and how", not "whine randomly about how bad the software is". – cHao Dec 14 '12 at 04:19
8

Is it safe to exclude support for the statistically insignificant percentage of users who decide (for whatever reason) to the old Mac OS style line-endings?

Maybe not too many users will detect it, but there's an elephant in the room: Windows line endings (CRLF). If you support those (I generally do, even though I only use Windows for games), it should be trivial to support the third part of this historic Bermuda triangle.

If you don't support something like this, you should at least mention it in the documentation ("This is not a bug" style) and how to change files to work with your tool in the simplest possible way (dos2unix for example).

l0b0
  • 11,014
  • 2
  • 43
  • 47
  • 2
    +1 for mentioning Windows using `CRLF` - it's the default line ending on that OS. And there's no way to guarantee the source of a .csv file, so it easily could have been created on a Windows system. –  Dec 13 '12 at 14:44
  • 1
    Mentioning CRLF in Windows isn't relevant because if you are catching LF as the break point then you'll automatically get CRLF as a bonus. The OP knows this as you can see in the text of his post. – davidethell Dec 13 '12 at 20:31
  • @davidethell Yep, that's how it's done. Currently, CR chars are silently ignored. Elephants notwithstanding. – Evan Plaice Dec 13 '12 at 22:39
6

There are many serial devices that rely on CR as an end to the data stream before the ETX is sent. It is a convention that will never go away.

Engineer2021
  • 3,238
  • 5
  • 28
  • 32
3

I would treat the request as any feature request where you need to weigh the costs against the benefits.

If exactly one person has asked for CR support, maybe it is not necessary. See the below book chapter from 37 signals where they say you should only worry about very popular feature requests.

http://gettingreal.37signals.com/ch05_Forget_Feature_Requests.php

Aaron Kurtzhals
  • 2,138
  • 1
  • 16
  • 18
1

MS OS's from MSDOS onward use the combination CR+LF as a line separator (I think mostly because of matrix printers which need them).

So yeah, it's a bummer, but you still need support for the damned thing.

linkerro
  • 687
  • 3
  • 9