How deterministic are SessionIDs from SHA'd GUIDs?

Question

Assume I'm using the following code to generate pseudo-random sessionID's:

sessionID = SHA-512(GENERATE-GUID())

The GUIDs are pretty deterministic, i.e. I see lots of GUIDs with a lot of the same hexadecimals.

My simple question is: How deterministic are my resulting sessionID's?

The algorithms for SHA are supposed to create very different hashes even if a small number of bits are different due to its cascading effect, so how easily could you "guess" (within reasonable time) another sessionID from the resulting hashes?

related: [Which hashing algorithm is best for uniqueness and speed?](http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed) (terrific answer in there - in particular, take a look at the section "GUIDs are designed to be unique, not random") — gnat, Nov 22 '13 at 12:48
I had read the post some other time I remembered, I wonder if SHA-512 also creates collisions for GUIDs, that's my main concern at this point I guess... — Davio, Nov 22 '13 at 13:18
As a non-secure PRNG this is fine. Hashing unique values produces random looking values. But it's not a secure PRNG, since GUIDs can be predictable. If your session IDs are secret tokens used to authenticate the client, this is not acceptable. — CodesInChaos, Nov 22 '13 at 16:17
Just use a proper secure PRNG and generate 16 bytes. Based on `/dev/urandom` on linux and `CryptGenRandom` on windows. Most languages offer an easy to use wrapper on top of them. For example in C# you'd use `RNGCryptoServiceProvider`. — CodesInChaos, Nov 22 '13 at 16:20
+1 @CodesInChaos. If guessability matters, use a cryptographically secure source. I'm guessing the (small) performance impact isn't going to make this a bottleneck, if it's indeed for session generation. — AakashM, Nov 22 '13 at 16:59
Thanks for your comments, I've started using another built-in functionality, a method called "generate-random-key". It's pseudo-random, but will fit my purpose well enough. — Davio, Nov 26 '13 at 10:26

score 1 · Answer 1 · answered Dec 06 '13 at 12:59

I would suggest you use an implementation of session id, that is known to be secure. Also there's RFC about UUID/GUID http://www.ietf.org/rfc/rfc4122.txt where you can learn that there are different versions of GUIDs. I suggest you switch to cryptographycaly secure random numbers.

mirabilos · Answer 2 · 2013-12-06T11:51:47.427

0

They are completely random (an UUID consists of 16 octets, some bits of which are fixed and some fully random), so they are nōndeterministic – and not guaranteed to be unique (especially if your random source is flawed).

A better way to generate unique session IDs (which is what I assume you want/need) is to use a counter (such as the PostgreSQL PRIMARY KEY SERIAL of the session table you use) and hash that with a per-installation-of-your-app secret. (Remember to protect your cookies with a MAC, e.g. a HMAC, and to use a different(!) secret for that.)

Also: UUIDs are 16 bytes, which is 128 bit, so there’s no point in hashing them into something longer than 128 bit.

Clarification: I mean something like this:

$handle = db_query_params('INSERT INTO session (remote_ip, begin, …) VALUES ($1, $2, …)',
    array($remote_ip, now(), …));
$seqnumber = db_insertid($handle);  /* find out which SERIAL PostgreSQL assigned to the session */
$session_id = sha512($somesecret . $seqnumber);

edited Dec 06 '13 at 11:51

answered Dec 06 '13 at 11:35

mirabilos

391
2
10

Of course the hash is unique – it’s just longer, which makes handling it more difficult, which is why I adviced against it. And I specifically said to hash the serial values with a secret, so that they are essentially random-looking but, on the other hand, never reused. – mirabilos Dec 06 '13 at 11:44
1

UUIDs don't necessarily contain fully random bits. V4 GUIDs are mostly random (apart from 6 bits) as you said. But V1 GUIDs are highly structured, and it sounds like the OP's API still uses V1 GUIDs. – CodesInChaos Dec 06 '13 at 11:48
Ah, indeed. That makes replacing them with something that is unique even more important, though, so I’d rather count it in favour of my solution. – mirabilos Dec 06 '13 at 11:49
I still recommend a proper CSPRNG over Hash(secret+counter). Counters can roll back when restoring from backup, secrets might not leak once... A proper CSPRNG is forward secret, recovers from compromise, doesn't require persistent state,... If you use the hashing approach, at least include a CSPRNG output in the hash inputs in addition to secret and counter. – CodesInChaos Dec 06 '13 at 13:06
Yes, but my point here is that, for a session ID, it’s (possibly more) important that the IDs don’t get reused. You can only do that if you have persistent storage. If your random seed is not good enough, chances are (although, admittedly, usually very low) that the same session ID is generated twice. (Also, counters are very long in PostgreSQL. If really worried, add the timestamp when the counter last rolled over, or just always include a timestamp. While time is not a secret, this would work perfectly here.) – mirabilos Dec 06 '13 at 13:09
I'm not worried about rollover, but rollback. I'm worried about rollback after restoring from backup, crashes, etc. – CodesInChaos Dec 06 '13 at 13:11
Oh, I see. In that case: yes, always include a timestamp, and make it a requirement that the time on the server be correct. (But keep the timestamp separate from the hash(counter+secret) to avoid reuse.) And of course only accept sessions that are listed in the database. – mirabilos Dec 06 '13 at 13:15

How deterministic are SessionIDs from SHA'd GUIDs?

2 Answers2