I would like to write some long text in some structure to allow a set of operations on that text. The question is which structure or format should I use, which suits best the use that I plan to do of that text?
Next I describe that use:
- I would like to write text in natural language, possibly with translations to several languages. Translations would simply be the same structure with different data (text).
- I would like to keep that text in a VCS, check diffs, branch and merge, etc. The structure should fit well this use.
- I would like to keep the text free of as much clutter as possible so that it is human-readable.
- I would like to easily convert the text to other formats, not necessarily many but at least html and pdf would be fine.
- I would like to be able to manipulate that text easily, for instance changing the order of some elements, filtering them, etc. based on metadata in that text.
- Metadata is data, that means it may be printed or not, or it may be printed in different ways.
Here are the main options I have considered so far:
- Latex: basically it is a language designed for this task. The problems I see are that it is not as readable as other options, for instance Markdown, and it is not really structured text. The text is there and the metadata about formatting options and so on can be separated with a set of macros, but the text is not really structured, changing the order requires either parsing it or defining all the text as macros so that only the order of the macro invocation needs to be changed. It's great for what it does, but becomes clumsy when it falls short in some feature, as structuring the text. I don't see a good separation between control information and data to be printed. It is a very good option to convert to pdf.
- XML: The structure in this case is fairly good, but in the current context, I see no advantage in using XML when HTML could be used instead, it provides the same features and some more.
- HTML: the conversion to HTML would be immediate in this case but the conversion to pdf is not so clear. In terms of human readability maybe markdown could be better, but HTML is probably the most widespread and used language for the task at hand, there are supporting languages like CSS (Less, Sass, too many options) that can make life easier, Javascript can handle it, anyone with a browser can easily read it, etc. Maybe some special HTML could be converted to quality Latex and there to pdf, I don't know.
- Markdown: a very good option in terms of readability, but I'm uncertain about how could it be manipulated, maybe through conversion to HTML and then using DOM manipulations or any other processing that could be done on XML and thus on proper HTML. I'm uncertain about how flexible may be for defining metadata (for instance a paragraph that is a summary of other paragraphs) when this could be easily done with XML or HTML via classes or other attributes.
- JSON: most languages include a parser for JSON thus it is very friendly for programming languages and easy manipulation m. Obviously some standard should be defined for JSON, but the same holds for the rest of the options, including latex (macros).
- CoffeeScript: this removes some clutter from the usual JSON, it may be more readable and can be converted into JSON easily.
- Mixing: the problem with JSON and CoffeeScript is that the structure to hold the contents is very flexible (maybe too much) but it doesn't support in a natural way inline annotations. A possible solution is to use Markdown or HTML for these fragments of text, including bold text or what may be needed.
The objective is to write a manifesto, or something that looks like a manifesto and evolves. This is based on some ideas that recommend using VCS systems. The point is to have a structure that allows to write once and publish as many times as may be needed and in different ways, maybe blog posts, pdfs, etc., because a lot of effort to reach the consensus has to be put to write the text, rewriting and rewording does not seem a good idea. This discards some other nice options, like a wiki, but it would be nice to be able to have it structured in some way such that a set of pages like a wiki could be built from the source data.
In the end the technology may not be there yet, but I think it is not too far. There are actually so many options that a clever use of some of them should be enough.