Nothing wrong with CSV up to a point
CSV works well for rigidly defined data that is unlikely to change format and doesn't spring many surprises on the recipient parser.
Here's a handy list of the big gotchas:
- Escaping ""s within ""s (field contains field delimiter)
- ""s containing CRLFs (field contains line delimiter)
- Unicode (underlying text format may be insufficient)
- Different line terminators for different OSes (is CR or CRLF or LF or NUL?)
- Inline comments (line prefixed with #, //, --, ; etc)
- Version management (the latest version of the file contains more or less fields)
- Differentiating between NULL and empty data (,"", is empty but ,, is null?)
You could approach this with a meta-data header that describes how the fields should be parsed, but then you may as well just use XML. It's because of this sort of freeform CSV mess that it was invented. The XML approach just seems too heavyweight for what could, on the face of it, be a simple problem.
A popular alternative is the "weird character delimiter" strategy. This gets around a lot of the escaping issues above because you use something like a | (pipe) character for field delimiting, and a CRLF for record termination. This doesn't get around the multi-line field issue (unless you use a field counter) but you do get nicely formatted lines for humans.
Overall, if you're just looking for a simple way of handling this kind of file then, in the Java world, you could simply throw OpenCSV at it. That way you abstract away all of the problems into an established framework.