For your specific circumstance, I'd just go ahead and try to write a parser if you have the spare time. I'd advise starting with XML-based parsers, as these are the simplest (as the syntax tree is already neatly written for you in the XML file).
For the more general question of when it's valid to write a parser, I'd argue that the following must be true:
- The inputs to the parser change often, and it would take longer to make the changes to the hardcoded equivalent parser outputs than it would to change the parser inputs
- The parser tackles a finite, well-understood problem domain, which changes both rarely and gives notice of changes
- The total time it would take to write the the hardcoded equivalents to all the parsers outputs from all its input files is greater than the amount of time it will take to write the parser itself
- The language dealt with by the parser is simpler or more convenient for its end-user than the language the equivalent hardcoded output would be written in
This may seem a tad opinion-based and complex, but my reasoning is essentially that a parser takes a very long time to write well. In order for the parser to pay off its debt (in terms of time taken to write it), it must be dealing with a problem domain where the alternative to the parser would be to write a lot of code to deal with each potential input to the parser. So let's run through the above beliefs with the example of HTML and HTML parsers:
- HTML pages do indeed change often, and it would take longer to change the visual tree as written in C++ than it does to change the visual tree as written in HTML. To change the location of a div in HTML, or to change its style, one can simply cut-and-paste the existing div somewhere else in the tree, and one can simply apply a new css class. Doing the equivalent in C++ would be a lot harder, because it wouldn't be anywhere near as easy as just cutting and pasting the same code to some other part of the C++ file.
- The HTML specification is finite and well understood. It's well known when the specification will change, because W3C convenes many meetings prior to each change. This means that the writers of HTML parsers know when it's about to change, so they can be prepared for changes and don't waste large amounts of time anticipating changes in the problem domain. The fact the problem domain is well understood and finite also gives parser-writers a good basis from which to say that their parser is complete i.e. an HTML parser is complete when it handles all of the known HTML elements that it will read. Imagine attempting to write a parser for something that changes constantly, and is vaguely defined; how would you ever know your parser was complete?
- Similarly to point 1, imagine attempting to write a web page as a set of C++ instructions. Coming up with a consistent way of handling the layouts of elements on the screen would take longer than writing a simple div! Additionally, given the fact that there's ~2.51 billion web pages, imagine the loss of time of writing each web page in its own C++ files, with its own frameworks. If a parser saves huge amounts of time over the alternative choice, and the parser will be used often, then it's a good sign that the parser may be a net positive.
- Again, if web pages were written in C++ then the pool of people capable of writing them would be severely diminished. Not to be snobby, but I think we can all agree C++, with its numerous complex pitfalls and segfaults, is a lot harder than HTML. If only inveterate C++ developers could write web pages, then I'd hazard a guess that we sure as heck wouldn't have ~2.51 billion web pages.
As a bit of personal anecdata, my company has written a parser for a client which takes XML and uses that XML to read and write data to and from SQL stored procedures into spreadsheets. The client is able to understand something like:
<Workbook name="SomeWorkbook">
<Sheet name="SomeWorksheet">
<DataCell range="A1" name="employee" input="SPGetEmployees" />
<DataCell range="A2" name="salary" input="SPGetEmployees" />
<DataCell range="B3" name="total" input="SPGetEmployees" />
<DataCell range="B4" name"isApproved" output="SPApproveWorksheet" />
</Sheet>
<DataSources>
<DataSource direction="input" type="SP" database="someDatabase" name="SPGetEmployees">
<Parameters>
<Parameter name="financialYear" type="DateTime" isDataCell="false" />
</Parameters>
</DataSource>
<DataSource direction="output" type="SP" database="someDatabase" name="SPApproveWorksheet">
<Parameters>
<Parameter name="isApproved" type="Bit" isDataCell="true" />
</Parameters>
</DataSource>
</DataSources>
</Workbook>
because it all looks familiar to them in their job role (semi-technical systems administrators), but the client definitely wouldn't understand the C# code that would otherwise generate this workbook. Their data sources for their worksheets change often, too, and it's quicker to change some XML than it is to change a lot of C# code. The problem domain is also well understood, because we're just reading and writing from some well-understood data sources to some well-understood outputs (Excel files), so we can write an XML-based language which provides for all the client's needs and doesn't have to be changed very often.
I'll leave you with this final caution from xkcd on the topic of optimizations such as parsers: http://xkcd.com/1205/