4

Title says it all. If the CSV's system delimiter was " (as opposed to a comma or pipe or other common alternatives), how would anything deal with it?

The crux of the matter is of course that by definition, CSV will surround any values containing the delimiter with quotation marks, and will convert all quotation marks to double quotation marks.

Would the result be parse-able?

(Inspired by an answer in Most common "Y2K-style" bugs today?)

System Delimiter (which drives excel, databases, etc)

Alain
  • 335
  • 2
  • 13
  • 5
    Seeing as CSV stans for Comma Seperated Values, you'd have to change your spelling of "quotation mark" to "cuotation mark" :P ...or change your file extension to ".qsv"... – FrustratedWithFormsDesigner Mar 30 '11 at 15:13
  • While CSV traditionally stands for comma separated values, the standard for the format allows for any separator to be used. There is a windows system variable in the registry used for controlling this. It's often programatically manipulated and changed with language settings. – Alain Mar 30 '11 at 19:35
  • I think the general name for such files is "Character-delimited values". – FrustratedWithFormsDesigner Mar 30 '11 at 19:41
  • The comma-separated values (CSV) file format is a set of file formats used to store tabular data in which numbers and text are stored in plain textual form that can be read in a text editor. Lines in the text file represent rows of a table, and delimiters in a line separate what are fields in the tables row. In general, the choice of delimiter being a comma is a common assumption, not a standard. No system that uses semicolons or pipes as the standard delimiter adopts a different extension. In any event, this is just a epistemological problem. The practical use is what is in question now. – Alain Mar 30 '11 at 19:45
  • Interesting screenshot, I've never seen that dialogue before. I don't know if that applies to all CSV files on your system, or how Windows decides to display lists on the UI. – FrustratedWithFormsDesigner Mar 30 '11 at 20:01

5 Answers5

4

Answer: It Breaks the system

I altered my system settings to test this problem out: Altered System Settings

I found out that Microsoft does not know how to handle this.

My original data was:

Original Spreadsheet

After I saved the data, it produced the following ambiguous data file:

This "This"122,342.23""Test""quote"
Is"Is"231,123.42""""quote""test"
A"A"234,234.23""""something"
Test"Test"234.34""something"""

Sure enough, when I tried to open the file back up, it had screwed it up:

Reloaded Data

This shows that the CSV standard fails in the case that the chosen delimiter is a quotation mark and the actual data contains quotation marks. This means the windows operating system should probably disable the user from selecting this as a quotation mark, or change the CSV standard so that in the sole event that the quotation mark is chosen as the delimiter, it uses replaces the escape character (normally a quotation mark) with some other character.

Alain
  • 335
  • 2
  • 13
1

You have to consider the actual system implementation. CSV is just a basic standard. If its coming out of Excel, a custom system, or some Linux editor the actual mileage may vary.

That being said, since you are a programmer I assume the system is something you have source code for.

"3\"4\""

The problem is obvious. The code is hard for a human being to read. Standard CSV

"3,4"

is much easier.

What I would do is change the delimiter. If existing output exists, write a script to find and replace \" with , (or another acceptable delimiter that does not affect the data)

P.Brian.Mackey
  • 11,123
  • 8
  • 48
  • 87
  • As far as I know, backslash is not used as an escape character for delimiters in the CSV standard. – Alain Mar 30 '11 at 19:34
  • @Alain - It may or may not be part of a published recognized std like of IEEE. I dont think it matters. Like any standard, its just a guide that the implementation may or may not follow. Just look at any browser, they all have HTML and js, but they cant agree on any one way to follow a standard. The answer chosen is also just MS's way to do it. By no means is that a catch all answer. – P.Brian.Mackey Apr 07 '11 at 01:58
0

Why not?

The only problem would be if you wrote a parser using a regex and didn't properly escape the search char

Martin Beckett
  • 15,776
  • 3
  • 42
  • 69
0

The only thing you really need to consider is how often you're going to find the character you use as a delimiter in your data fields. I'd worry a bit about using double quotes, simply because double quotes are often used in conjunction with the regular delimiter (e.g. "A","B","C","D","ETC").

Satanicpuppy
  • 6,210
  • 24
  • 28
0

There would be no difference. You are still using some character to delimit each field and that character would need to be escaped when it occurs in the data. Choosing what that character is should be based on the following:

  • The character is not likely to occur frequently in the data (Reduce overhead)
  • The character should be easy to parse out (Make the job of the person writing the parser easier. If the character has other well defined uses in the context of text manipulation libraries, it leaves room for errors.)
unholysampler
  • 7,813
  • 1
  • 30
  • 38
  • The CSV standard implemented on the windows system uses the quotation mark to escape the delimiter appearing in the data - therein lies the difference. There's nothing in the CSV standard that accounts for the case where the quotation mark (used to escape the delimiter) IS the delimiter. – Alain Mar 30 '11 at 19:39