Fastest way to generate (machine wide) unique 64 bit timestamps on Windows

Question

Here are some of the characteristics of my problem:

I must generate unique 64 bit timestamps across processes on a single machine. I am very much limited to those 64 bits. This seems to rule out using GUIDs as part of the solutioning.
Uniqueness is critical, monotonicity (across processes) is not as important.
It is crucial that this code is as fast as possible (most likely ruling out e.g. a central timestamp service through some IPC mechanism). This has been proven to be in the hot path.
multiple instances of my process will run on same system. One thread on each will be tasked to generate these ids.

The way I have currently implemented it:

(if a process requests a timestamp) we

Grab an initial timestamp value (from the fastest available underlying clock source, e.g. GetSystemTimePreciseAsFileTime from kernel32)
Acquire a Global Mutex
In a while(true) { ... } loop, keep grabbing values from our underlying clock source until the value "rolls over" (e.g. changes from the value grabbed in (1)
Release the global mutex.

That should satisfy the uniqueness constraints as long as the time source is monotonic. As expected, there is still a real cost associated with global Mutex acquisition and release (kernel context switches etc.).

Is there any other approach or algorithm that I could use for the problem as stated above?

Why cant you get the first one and then add 1 millisecond to get the 2nd, then 1 to the 2nd to get the 3rd ... and so on? — tgkprog, Oct 24 '19 at 18:35
@tgkprog: Unless I misunderstand this is not sufficient to guarantee uniqueness across processes? — ChristopheD, Oct 24 '19 at 18:42
i guess i dont know much about how this works then. I thought if your doing it on one thread. It should not matter. One loop will just be run on one process. I have not seen your full code but you should be running this one a singleton object without any threads. — tgkprog, Oct 24 '19 at 18:51
Why don't you just start with a number, e.g. 0, and just count up every time one is requested, and use a few bits for a process id? If you need an actual **timestamp** you have to abandon your uniqueness requirement because things can truly happen at the same time in a multiple-processor machine. — whatsisname, Oct 24 '19 at 18:53
@tgkprog: This code (acquiring the timestamp) runs only in a single (main) thread per process. However, if 2 (or more) processes retrieve the *same* timestamp from the underlying clock source (which will absolutely can happen on Windows), how does adding milliseconds help guaranteeing uniqueness? — ChristopheD, Oct 24 '19 at 19:08
@whatsisname: Unfortunately I can not forgo the uniqueness requirement :-). And ideally I keep the timestamp portion as well. I can encode the ticks (after some epoch adjustment) in 52 bits. Hence 12 bits are available to add "randomness" per process (but unfortunately even just a simple PID requires 32 bits). I was thinking that perhaps some lock-free shared memory algorithm exists (because if I need to lock, chances are the current implementation is as good as it gets). — ChristopheD, Oct 24 '19 at 19:11
@ChristopheD: well you're between a rock and a hardplace. If you want timestamps to be unique **and** represent moments in time, you have to sacrifice the "fast as possible" because you can only issue one timestamp per tick of the system timer, which may be slower than your cpu can otherwise spit out values. Regarding including a PID, you don't need to allot 32 bits, you can have the processes negotiate a new smaller-size id that is unique amongst themselves, and only has to be resolved once on startup. — whatsisname, Oct 24 '19 at 19:24
@whatsisname: I really like the "shortened" negotiated PID suggestion. That might help, thanks for the input! — ChristopheD, Oct 24 '19 at 19:30
@ChristopheD: "a simple PID requires 32 bits" - but you surely have a maximum number of processes which will have to generate the time stamps, I guess? Lets say you can make sure there are less than 256 processes of this kind. Assign each process, when started, a unique 8 bit number from a central pool. Now use this as the PID part of the 64 bit "timestamp", not the 32 bit PID from the system. — Doc Brown, Oct 24 '19 at 22:19
You're letting you're quest for uniqueness drive you to unobtainable clock precision. Let's assume unique monotonic ordering, and time are separately solved concerns. The solution will be unique and monotonic regardless of how you answer this next question. How much time precision do you really need? Do you need to know down to the minute? the second? the millisecond? — candied_orange, Oct 25 '19 at 01:32
did not realize u will have multiple instances of ur process per system. if u can use a system like an app server. singletons where needed; can cache remote resources(if any like API results, db items that are at max 1-2 gb and change seldom / use threads so all cpus are used? — tgkprog, Oct 25 '19 at 09:57
@ChristopheD then I think you can avoid locks by blocking to avoid rolling over an incrementer that uses the non clock bits. The block ends when the clock bits change and the incrementor starts over. This is still monoamicly increasing but still puts the timestamps in an order. — candied_orange, Oct 25 '19 at 14:18

nokola · Accepted Answer · 2019-10-25T15:19:33.707

You mentioned in the comments you're OK to compress the timestamp to 52 bits. Few other people mentioned you can negotiate smaller process Ids.

I'm expanding on these ideas with one way to generate lock-free very high speed timestamps, not just machine-wide, but world-wide. That's because the processes don't share state apart from initial setup and thus can easily live in different machines.

Encode each timestamp like this:

 8 bits -> shortPid, a generated "process id" during startup
52 bits -> shortTime, the compressed timestamp
 4 bits -> localCounter, to resolve timestamp overlaps, and avoid waiting for the system clock

The algorithm:

Have a central "service" to generate shortPid-s, that each process will call on startup. Example pseudocode:

byte shortPid = 0;
byte getNewPid() {
    lock {
        shortPid++;
        return shortPid;
    }
}

On each process start, get their shortPid from the central service:

ulong processPidMasked = getNewPid() << (52 + 4); // get pid and move into first 8 bits of 64-bit ulong

To generate times efficiently, we'll use the 4-bit localCounter, to enable generating up to 16 (2^4) unique values per time tick. Assuming there's a function ulong getCompressedTimestamp() that returns a 52-bit timestamp, here's how to get a unique timestamp per process:

ulong localCounter = 0;
ulong lastCompressedTimestamp = 0;
ulong getUniqueTimestamp() {
    while (1) {
        ulong now = getCompressedTimestamp(); // assuming 52-bit timestamp
        if (now != lastCompressedTimestamp) { // if we haven't generated any timestamp this tick
            lastCompressedTimestamp = now;
            localCounter = 0; // reset to time slot 0
            return processPidMasked | (now << 4) | localCounter;
        }

        // we generated one or more unique stamps on this tick, see if there's any "free slots" left
        if (localCounter < 15) { // if we have free slots for this tick
            localCounter++;
            return processPidMasked | (now << 4) | localCounter;
        }

        // worst case scenario: we exhausted our local counter for this tick:
        // loop until we get to the next time slot
    }
}

Notes:

Critical code is lock-free and without explicit shared state. This will make generation very fast. The only lock is during startup.
You can modify how many bits you allocate for the compresed timestamp, localCounter, and shortProcessId to make it fit your use-case. With the current code, you'll be able to generate 16 timestamps per compressed-time-tick, which may fit your usage nicely.
Calling the OS time function may end up being the implicit shared state. I'm not sure how it will perform for your scenario.
Twitter uses (used?) similar algorithm for generating unique time-sortable Ids accross machines called Snowflake. They store some bits for machine id, some for time, and some for "sequence number" (our localCounter.) Similar example implementation I found by googling here: https://github.com/sony/sonyflake
With the above algorithm, you'll be able to theoretically generate maximum 1,600,000 unique timestamps per process per second. Hopefully enough for your case :)

Excellent! This was exactly the type of idea I was looking for, thanks! I was trying to find a way to implement it without locks, and this looks like it will work wonderfully. — ChristopheD, Oct 25 '19 at 03:56

score 2 · Answer 2 · answered Oct 24 '19 at 21:18

Well, first get rid of the lock.
Doing it lockless is probably much more efficient.

If that doesn't help enough, use batching.
Stockpile up to n consecutive timestamps for each local thread. Whenever the stockpile runs dry, refill from the shared global source.

The disadvantage is that the timestamps can lag too much behind the actual time for your comfort if the demand is irregular or low.

Potentially add a process-level stockpile between the shared global source and each thread.
Potentially refill the stockpile asynchronously when it falls below a threshold to avoid holdups.

Ewan · Answer 3 · 2019-10-24T19:32:34.560

-1

Yes. Avoid the while loop and simply add a millisecond if multiple values are requested in the same tick.

ie

aquire lock
read time
is <= lastTimeRead ? use lastTimeRead + 1ms
update lastTimeRead
release lock
return

As you want this across processes you need some interprocess communication.

The simplest way would be to run it as a seperate time service on the box and have it listen on a socket. if that's too slow, how about Com+?

Alternatively you could add a random number of milliseconds. Obviously this only gives you a chance of uniqueness. But perhaps you can pick up on the dupes at a later stage of processing?

edited Oct 24 '19 at 19:32

answered Oct 24 '19 at 18:59

Ewan

70,664
5
76
161

2

Unless I misunderstand, I don't see how this solves the "guarantee uniqueness across processes" constraint. – ChristopheD Oct 24 '19 at 19:12
You still use the mutex etc. just not the while loop – Ewan Oct 24 '19 at 19:16
1

I could see this working correctly across threads, but not across processes (unless lastTimeRead is in shared memory; in which case extras synchronization is required). What stops processes from getting the same timestamp (e.g. hence violating the uniqueness constraint)? If they compare to their "local" lastTimeRead they can still end up producing the same value. – ChristopheD Oct 24 '19 at 19:22
ahh i see what you mean – Ewan Oct 24 '19 at 19:24
1

Do all the processes run on the same box? – Ewan Oct 24 '19 at 19:49
3

@Ewan: perhaps you should ask those questions before trying blast out an answer – whatsisname Oct 24 '19 at 20:14
@Ewan: all the processes run on the same box. Named pipes, sockets (rudimentary IPC) is too "expensive" (see 3rd bullet point). Adding randomness is an option (for the last 12 bits of the 64 bit timestamp in a particular format). – ChristopheD Oct 25 '19 at 03:08
@whatsisname just checking, it seemed odd to me that you would want per process uniqueness but only per box. But yeah, obvs I skim read the question and missed the level of time resolution required. Assumed it was the usual one where you only get 30ms accuracy on ticks. soz! – Ewan Oct 25 '19 at 09:24

Fastest way to generate (machine wide) unique 64 bit timestamps on Windows

3 Answers3