How to mitigate third party data feed issues?

Question

We have a Java application that uses a live 3rd party data feed. There are several steps in our application and in each step the application reaches out to the 3rd party data feed with the current state of the user flow and received back the data for the current step.

Recently, we have noticed that there have been problems with the data in the feed. Because of this our application barfs and users cannot proceed. Even though the application handles the error, our priority is to let user complete the flow.

In order to do that I have been thinking of starting to take snapshots of the feed and version them, so that in case the external party feed has problem we can switch to our internal snapshot till the external feed is fixed.

Does this makes sense? I am wondering if this is a good strategy or there can be something else we can do. Also, are there any tools that lets you keep snapshots of data?

Caching the data *might* not be a bad idea, but what happens if a user completes their flow using cached and possibly *stale* data? How will they be affected? How often does the data in this feed change and how often would you have to update your cached version to keep things from getting too stale? Would you be required to let the user know that the data they are using might be stale? How long does it take for their feed to come back? Could you just give the user a "please wait for *n* minutes while we work on this" screen? — FrustratedWithFormsDesigner, Mar 08 '17 at 15:29
@Walfrat: That's a good point: maybe the user could save *most* of their progress and then come back and fill out the rest later when the feed is available again? — FrustratedWithFormsDesigner, Mar 08 '17 at 15:36
@FrustratedWithFormsDesigner: We are going to use cached feed as fallback only and that too for a day or two. The data feed does't change that often. So, a data snapshot will be good to use while the 3rd party fixes it. The 3rd party usually take about a day or two fix the feed so we can't have the 'n minutes' waiting option. — Blueboye, Mar 08 '17 at 17:16
@Walfrat: Yes we do. The options on the current step differs, based on the inputs from the user in the previous step. — Blueboye, Mar 08 '17 at 17:17
@Blueboye: well, if it won't negatively affect the users (and it sounds like it won't) and it's not too difficult/costly to implement, then I say: Go for it! :) ...but you should probably consult your manager before taking *my* advice. :P — FrustratedWithFormsDesigner, Mar 08 '17 at 18:27

score 2 · Answer 1 · answered Sep 12 '17 at 10:07

Caching date makes sense under certain conditions:

Working on an older set of data still makes sense. Imagine banks performing buying/selling operations on old market data!
Make the user aware of the fact that he is working with older data. I would add an indicator to the application that supplies a hint "You are currently working on an older set of data."

You'll have to decide whether you want to cache the data on the application of network level.

Application Level

On the application level means less work for you but only works reliably if the feed refreshes often and there is a good chance you can grab a valid feed within a couple of minutes).

The usual pattern of accessing the data is probably in a similar fashion to:

Download -> Parse -> Validate -> Use in business logic

These steps should be encapsulated in different classes, invisible to the business logic. It simple asks some class to "provide data please". You can use this to your advantage by adding a "caching" I would add the following step:

Download -> Parse -> Validate -> Store -> Use in business logic

By store I mean save whatever data you have after the validation (that may either be a raw string or some deserialized classes) to some kind of abstract data storage (different implementations possible, db, file, memory). This is basically an application of the decorator pattern.

Network Level

You can also create a simple web server that acts as a proxy. On each request the server tries to get a version from the remote source and performs the parsing and validation of its contents. If that is valid it replaces the contents of its current cache and returns the cache to your application.

To reduce the amount of change to your application I would make the proxying server behave in the same way as the remote server. Though, you might want to add an attribute to the returned feed indicating that it is cached (to display that in your application). It should not take a seasoned developer too much time to do this.

score 1 · Answer 2 · answered Mar 15 '17 at 15:05

I have been in this position, where we had incorrect data in an external feed and also we continuously feared that the feed would break our system any day.

My advice:

Try to fix the data in the feed if at all possible. We for example noted that although the feed was XML, it was not valid XML. We ended up implementing a script that fixes it to be valid XML.
Validate the data. Hard! In fact, do every single validation check you can come up with. You can disable any check temporarily or permanently if the particular check is too strict.
Implement sanity checks and lots of "do you want to go ahead?" questions. For example, if you have lots of objects in the database, an incorrect feed could end up deleting them all. So, you will most likely want a question "deleting 100 000 objects, do you want to go ahead?". One way to do this is to have a "dry-run" option for your feed handling scripts that prints statistics of how many objects were changed, but does not actually do anything.
If some database objects are particularly important, "protect" them so that information about their changes will be told to the person who runs the feed scripts, and the person can then investigate the changes manually and see if they make sense.
Download the feed as often as you can! If the feed is updated every week, you do not want to miss any week. Download it every week! In fact, set up scripts to do this because otherwise you'll forget it someday.
Use the newest version of the feed that seems to satisfy your standards for quality.
If the feed ever gets broken, inform the people who maintain the feed as soon as possible, and request them to fix it quickly. If you just treat the feed as a black box that does not use feedback, you will get a feed that will be frequently broken. The people who maintain the feed must be told about all of the problems in the feed.
Maintain historical compressed copies of the feed. If disk space ever becomes an issue, investigate whether delta compression could save your day.

With this advice, I'm sure you can have both up-to-date data in your database and also mitigate against issues that could end up causing major havoc in your database.

How to mitigate third party data feed issues?

2 Answers2

Application Level

Network Level