5

Suppose users of application A want to see the data updated by application B as frequently as possible. Unfortunately app A or app B cannot use message queues, and they cannot share a database. So app B writes a file, and a batch job periodically checks to see if the file is there, and if load loads it into app A.

Is there a name for this concept? A very explicit and geeky description: "running very frequent batch jobs in a tight loop to emulate near real time".

This concept is similar to "polling". However polling has the connotation of being very frequent, multiple times per second, whereas the most often you would run a batch job would be every few minutes.

A related question -- what is the tightest loop that is reasonable. Is it 1 minute of 5 minutes or ...? Recall that the batch jobs are started by a batch job scheduler (e.g. Autosys, Control M, CA ESP, Spring Batch etc.) and so running a job too frequently would causes overhead and clutter.

Mark Booth
  • 14,214
  • 3
  • 40
  • 79
  • As to the tightest loop: it depends on how long the job is running. You want to avoid overlapping jobs, that is, polling again while the previous job is still processing things. If you use a decent locking mechanism, this is a no-brainer, and you can poll as fine as you like (but leave a safe margin for the polling overhead itself). – tdammers Jun 15 '12 at 12:06
  • We called it synchronization, as we were trying to keep an IDMS database and a DB2 database synchronized. – Gilbert Le Blanc Jun 15 '12 at 12:27
  • 2
    It is still polling... –  Jun 15 '12 at 12:58
  • @tdammers - I don't understand your comment. You may want to poll *much* faster than the job completion throughput, if you don't know how long a job will take and can't afford your client to be idle for any longer than is necessary. Also, if you had a *decent* locking mechanism you *wouldn't need to poll at all*, as the release of the lock would provide the necessary synchronisation. – Mark Booth Jun 15 '12 at 18:14

2 Answers2

11

You were correct the first time, polling is the correct term to use in this situation. Whether you are polling at 1 mHz or at 1 MHz, it is still polling.

Note milli Hertz is not a unit I've ever see used, a poll rate of once every million seconds (11.6 days) having limited use. *8')

From the wikipedia polling page:

Polling, or polled operation, in computer science, refers to actively sampling the status of an external device by a client program as a synchronous activity.

In this case, the batch job is the client, while the file is the mechanism allowing the client to synchronise with the external device (application B).

Determining a suitable poll rate can be a tricky business.

  • If the client polls too frequently then it could end up starving the device (possibly another process on the same multitasking system) of the resources it needs to source the data needed by the client quickly enough, slowing the whole system down.

  • Poll too infrequently and your client could be sat idly waiting for the next poll while there is data sat waiting to be processed.

Both cases can result in the sytem running sub-optimally.

As an example of the former, I have seen system which has spent so long servicing "is there new data" requests that it had no time left to actually prepare the data being asked for (a form of livelock).

For the latter, I have a device with a 60 second poll period. Since I might need 3 round trip communications to complete a single transaction with it, each transaction may take anywhere between 3 and 6 minutes (each request happens just before a poll to each requests happens just after a poll).

Mark Booth
  • 14,214
  • 3
  • 40
  • 79
  • Once every 11.6 days could be the rate you poll your bank account. –  Jun 15 '12 at 12:59
  • Great answer. Many real time systems are actually polling systems. The "real time" refers to the perceived delay between refreshes. –  Jun 15 '12 at 13:23
0

When I was working at a bank, the guys in architecture claimed "we have the only real-time banking system in Australia, all the others use fast-batch" *.

Which was exactly as you describe, they would run a batch processing of all pending transactions several thousand times a second.

So fast-batch is a term that is in use to do what you describe.

*The truth of their claims is not relevant to this question