1

I have website that crawls data from many third party services when user browse to webpage. This can be really slow because I hit third party server and process returned data before showing it to user. I am hosting website on Azure (shared mode). I am thinking to improve my implementation. Here is what I am thinking...

Run a service that crawls data from third party services, process it and then store it in database. when user browse to my site, my site pulls data from database and display them to user.

But above solution is not clear to me. Should I have normal service or wcf service? If wcf service then should website talk to database or wcf service (that can access data from database)? If normal service then how can I deploy on Azure?

gnat
  • 21,442
  • 29
  • 112
  • 288
Andy Frank
  • 11
  • 1

2 Answers2

2

Your idea is a step in the right direction. Unless the data absolutely has to be presented in real time, you should always do the crawling via a background service and display the most recent data from the database (or even better, a cache). How you implement that service is entirely up to you, though.

Some recommendations:

  • Use a background process or windows service. WCF is meant for integration, not background processing..
  • Either plan on having a dedicated worker role for this process, or engineer a way to ensure that only one web role instance runs the service at any given time. You don't want multiple instances updating the same data simultaneously, not to mention the wasted resources of crawling the data from multiple instances at the same time.

If you decide to go with a dedicated worker role (you can use a single extra small instance, most likely, which costs only $0.02 / hour), look into overriding OnRun in your RoleEntryPoint class. There's no need to actually create / install a true windows service here.

Chris
  • 151
  • 2
0

You could first send the website without the 3rd party data to the client, and then make the client request the data of each 3rd party site using an XmlHttpRequest to your server.

That way you could display a "loading" placeholder animation for each external service which hasn't loaded yet. When some of the external services take longer to load, the user could already see the data from those which have already responded.

Philipp
  • 23,166
  • 6
  • 61
  • 67