How much time should you allow for clock skew?

Question

We are writing a B2B application that uses OAuth 2.0. Before a token expires we want to refresh it since it is faster to refresh a token, than to request a new one. However, we want to build in some resilience in the system to clock skew, so we want to consider a token expired a few seconds/minutes before it would actually expire.

How many seconds/minutes is it reasonable to expect for two servers, both connected to the internet, in different locations, to differ by?

Móż · Accepted Answer · 2014-06-17T01:43:10.587

You need to do this based on what you actually see in your system. The two basic approaches are to do it once or to make your code adjust on the fly. The first is obviously a lot easier to do but somewhat less resilient.

Doing it once means logging the time difference for each pair of servers for a while and seeing what you get. This might take a while, since you want to see whether they're re-synchronising to NIST or similar at intervals. But even very early on you should get an idea of whether the time differences are consistent or not. Then hard-code in some reasonable value. The biggest jump is likely to be the correction one, but you may never see that actually happen so you'll be guessing "our site is quiet overnight but the skip sees to happen between 2am and 6am AEST" is likely).

The more resilient but more code version is to do the above but map your actual offset to the most recently observed values. If your timeout is a couple of hours and you see the other system has been drifting a second a month just build that in to your pre-expiry. It only gets complex when you're trying to guess the point at which the other system will jump a couple of seconds back due to a monthly clock correction.

I think the latter is overkill unless you're looking at jumps within a factor of 10 of the total expiry time. At that point you're going to want to throw away 20% of the total time, which is a 25% increase in OAuth traffic. So it might be worthwhile to write more code in an effort to reduce it. But even then, that might not be the biggest bandwidth saving you can get for your programming time.

It's B2B so the real issue is batch jobs where we need to upload a 1,000 records over an hour when the tokens expire after 20 minutes. The phrase `25% increase` put it into perspective that 20% seems fine. — Sled, Jun 17 '14 at 01:27
This is why I feel happy I design my own simple authentication protocol in which the token expires because the user asks or because the last request was too long ago. In this case, if 1000 records are done in multiple requests, each of them extends the valid time for the token. Never have to worry about expiration. If done in a single request (?!), the initial validation is valid anyway. — InformedA, Jun 17 '14 at 05:32

Robert Harvey · Answer 2 · 2014-06-16T21:57:38.487

If both servers are synchronized to NIST time or something similarly accurate, the difference should be negligible (subject to transmission delays between the two servers; seconds or less).

If the servers are keeping their own time, and are not properly maintained, the difference can be substantial (minutes, or more). Kinda depends on whether the servers have decent operating system software that automatically updates itself to Internet time periodically.

The clocks on every device I own that is connected to the Internet are never more than a tiny fraction of a second off. Even if they only sync once a month, they're not going to drift all that much, and if they take their cue from the mains power (which is kept at a very accurate frequency), they may not drift at all. Good watches based on a quartz crystal drift a few seconds each month. The computers that are five minutes off are the ones that are never updated.

But even if they are synchronized to NIST doesn't it depend on how often they sync? If they do it once a month or so, then you can have a a minute or two non? — Sled, Jun 16 '14 at 20:53
Yeah, I've seen a wacky server where the deviation was nearly an hour per day. This was long ago before internet time sync. — Loren Pechtel, Jun 17 '14 at 05:05

Omer Iqbal · Answer 3 · 2014-06-17T03:34:59.343

This is how I would probably solve it, if possible:

Get the time from the target server (that will validate the token), and compute the difference with the time from the server that issued the token. Set a timer to get the token before it expires according to the time on the target server.
- Do add a little bit of margin for round trip calls to get a new token as well as for the roundtrip call to check the time.
- Add some margins for retries in case the server is down. For example, if the token is to be refreshed before 8:22:20, you may want to start at 8.21:20 with the assumption that you can try thrice (in case of network/connectivity failures) with 15 sec interval and still have 15 sec for the roundtrip calls margin. You can increase the margins if your token life is in hours instead of minutes.
- When the code gets a token, and figures out that it will need to be refreshed too quickly (say below a certain threshold, such as < 50% of token lifetime or 5 mins), it should raise an alert because the servers are getting really out of sync.
- In the computation, the code should normalize all time to the same time zone, such as UTC, or the code will go all whacky.
- On some platforms, getting the time of a target server is quite trivial. On others, you may have to expose web methods to do this, but hopefully, that should be fine.
Despite having the logic to get refresh token, I think your code should get an access token if the window to get refresh token has passed, but raise an alert so the code can be fixed. If the operation is not time critical, you might want it to fail gracefully if it's not been able to get a refresh token, but it should still raise an alert to fix the code.

How much time should you allow for clock skew?

3 Answers3