Code execution time out occasionally

Question

I am working on an e-commerce website. There is a case where I need to fetch the whole data in database through a third-party API and send it to an indexing engine. This third-party API has many functions like getproducts, getproductprice, etc., and each of those functions will return the data in XML format.

From there I will take charge, I will use various API calls and will handle the XML data with XSLT. And will write to a CSV file. This file will be uploaded to an Indexing engine.

Right now I have details of 8000 products to feed the engine, and almost all time the this process takes about 15 min to complete, and sometimes fails. I can't find a better solution for this. I am thinking about handling the XML data in C# itself and get rid of XSLT. As I think, XSLT is far slower than C#.

Is it a good Idea? Or what else I can do to solve this issue?

Is it possible to send only the changes (delta) instead of the whole data? — rwong, Aug 02 '13 at 06:32
this post is rather hard to read (wall of text). Would you mind [edit]ing it into a better shape? — gnat, Aug 02 '13 at 06:46

Neil · Answer 1 · 2013-08-02T08:31:03.247

XSLT is made for this type of processing. Even if it were slower, it would be ludicrous to try to rewrite it yourself. It would be like saying you don't like how slow regular expressions are, so you want to write a program to do the pattern matching yourself.

The problem is not a technical but rather conceptual. Why are you having to feed 8000 products from a third party database? Likely most of this data hasn't changed since the last time you did it. Surely there must be some way of getting the differences between the last time you fetched the data and now.

Have you considered using an ETL program to port data from your client's database to your own on a regular basis? These programs are made for porting data, and they rarely fail. However in the case that they do fail, most allow the possibility of reattempts or performing alternative operations on fail. They could be scheduled to run on a regular basis in the morning when nobody is paying much attention. I assume your indexing engine is designed to import all this data from the csv file, in which case your job would be finished if you used the ETL to update the database instead. The web application doesn't have to know about the existence of this ETL and can continue to operate using the data it loads from the local database.

OT: I like your avatar. Makes me smile... – Marjan Venema Aug 02 '13 at 08:44 — Marjan Venema, Aug 02 '13 at 08:44
@MarjanVenema I'm happy to know I made you smile. :) – Neil Aug 02 '13 at 08:47 — Neil, Aug 02 '13 at 08:47

Tom Tom · Answer 2 · 2013-08-02T09:01:50.063

At the very least, have you considered harnessing the third party API's methods to multiple to retrieve the data from the DB in multiple trips (i.e. one subset at a time)? You could send it to the indexing engine in smaller chunks (if possible) or at least build the returned XML data in memory until retrieval of all data completes.

There'd be more round trips, but each wouldn't take as long (you used the word "timeout" - I'm not sure how literally). Furthermore, if one of these smaller operations did fail, you could just retry that single operation rather than having to restart the entire process.

Code execution time out occasionally

2 Answers2