You mentioned in the comments you're OK to compress the timestamp to 52 bits. Few other people mentioned you can negotiate smaller process Ids.
I'm expanding on these ideas with one way to generate lock-free very high speed timestamps, not just machine-wide, but world-wide. That's because the processes don't share state apart from initial setup and thus can easily live in different machines.
Encode each timestamp like this:
8 bits -> shortPid, a generated "process id" during startup
52 bits -> shortTime, the compressed timestamp
4 bits -> localCounter, to resolve timestamp overlaps, and avoid waiting for the system clock
The algorithm:
- Have a central "service" to generate
shortPid
-s, that each process will call on startup. Example pseudocode:
byte shortPid = 0;
byte getNewPid() {
lock {
shortPid++;
return shortPid;
}
}
- On each process start, get their
shortPid
from the central service:
ulong processPidMasked = getNewPid() << (52 + 4); // get pid and move into first 8 bits of 64-bit ulong
- To generate times efficiently, we'll use the 4-bit localCounter, to enable generating up to 16 (2^4) unique values per time tick. Assuming there's a function
ulong getCompressedTimestamp()
that returns a 52-bit timestamp, here's how to get a unique timestamp per process:
ulong localCounter = 0;
ulong lastCompressedTimestamp = 0;
ulong getUniqueTimestamp() {
while (1) {
ulong now = getCompressedTimestamp(); // assuming 52-bit timestamp
if (now != lastCompressedTimestamp) { // if we haven't generated any timestamp this tick
lastCompressedTimestamp = now;
localCounter = 0; // reset to time slot 0
return processPidMasked | (now << 4) | localCounter;
}
// we generated one or more unique stamps on this tick, see if there's any "free slots" left
if (localCounter < 15) { // if we have free slots for this tick
localCounter++;
return processPidMasked | (now << 4) | localCounter;
}
// worst case scenario: we exhausted our local counter for this tick:
// loop until we get to the next time slot
}
}
Notes:
- Critical code is lock-free and without explicit shared state. This will make generation very fast. The only lock is during startup.
- You can modify how many bits you allocate for the compresed timestamp, localCounter, and shortProcessId to make it fit your use-case.
With the current code, you'll be able to generate 16 timestamps per compressed-time-tick, which may fit your usage nicely.
- Calling the OS time function may end up being the implicit shared state. I'm not sure how it will perform for your scenario.
- Twitter uses (used?) similar algorithm for generating unique time-sortable Ids accross machines called Snowflake. They store some bits for machine id, some for time, and some for "sequence number" (our localCounter.) Similar example implementation I found by googling here: https://github.com/sony/sonyflake
- With the above algorithm, you'll be able to theoretically generate maximum 1,600,000 unique timestamps per process per second. Hopefully enough for your case :)