The more the input is loosely-structured, the harder is to parse it. In this way, XML is already loosely-structured. When I get <price/>
tag, I expect it to be a number, like <price>59.90</price>
, but nothing, in XML, ensures that I will actually receive a number. What about <price>USD59.90</price>
, or <price>59,90 €</price>
, or <price>I don't know what to put here because I'm lazy to read the API</price>
?
The nice part is that you can validate XML with DTD. The same is not true for Excel.
The closer the data format is to the user, the more random this data would be. Just because users appreciate the ability for them to add sheets in Excel or reorder columns, etc., and they expect any parser using the data from the Excel being able to still understand where the data is and how is it saved.
I had to work on an application which parsed some PDF documents. Once, the phone rung in the IT support department. The customer was screaming about how the application sucks: when he sends his PDF, the application responds that the PDF contains no text. When the customer finally sent us the PDF, we immediately understood what was wrong: it was a scanned document with no OCR: every page contained just a bitmap with the scanned text. For this customer, it was still text and our app was expected to read it as any other text document.
This randomness makes it very difficult to work with those files. In XML, you at least expect some structure. What if you had to parse:
<product>
<product <title>Product 1
price="59.90
in-stock >19<
<product> title=Product 2
price="Fourty nine USD ;
inStock = 62
</products>
This is still XML, and the user would not understand why your stupid app can't parse something like this, while it's very clear.
Now, let's get back to the arguments you can give to your stakeholder with no technical background:
1. How the integration service would be notified?
With a web service, it's easy. You invoke it, sending the HR data to it, and that's all.
With an Excel file on a network drive, things become complicated:
Either the integration service constantly checks for new files on this drive, in which case this will have an important impact in terms of bandwidth (also, if the network drive is unreliable, this may arise even more issues),
Or the integration service must be called after saving the Excel file, in which case instead of directly using the web service, you are using the network drive, then a web service.
2. Loading data from Excel is expensive
2. a. In terms of immediate cost
Any decent programming language can parse XML. None, I believe, can read Excel files. In order to read them, you must either use Microsoft Excel COM (which is limited to the desktop version, and can't be done server-side), or some paid third party products.
2. b. In terms of resources
I don't have profiler results to support this, but very probably loading data from an Excel file would cost much more in terms of CPU then parsing XML.
3. Loading data from Excel is error-prone
Excel files have one problem: they are modified by users, and users may do any change they want. What if they rename a column? What if they add a sheet before the one you need to parse? What if they export the Excel data in a format you didn't expect?
Here's the conclusion for the stakeholder with no technical background:
Cost effective, you say?
Let's see. With the Excel on a network drive approach, you would need to develop two systems instead of one, given that one of the systems will be hugely error prone (heavily increasing the maintenance cost) and would require buying licenses and more powerful servers.
Higher infrastructure cost;
Higher development cost;
Higher maintenance cost.