Why are UUID / GUID's in the format they are?

Question

Globally Unique Identifiers (GUID) are a grouped string with a specific format which I assume has a security reason.

A GUID is most commonly written in text as a sequence of hexadecimal digits separated into five groups, such as:

3F2504E0-4F89-11D3-9A0C-0305E82C3301

Why aren't GUID/UUID strings just random bytes encoded using hexadecimal of X length?

This text notation contains the following fields, separated by hyphens:

| Hex digits | Description
|-------------------------
| 8            | Data1
| 4            | Data2
| 4            | Data3
| 4            | Initial two bytes from Data4
| 12           | Remaining six bytes from Data4

There are also several versions of the UUID standards.

Version 4 UUIDs are generally internally stored as a raw array of 128 bits, and typically displayed in a format something like:

uuid:xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

No, it probably isn't for security reasons, the bitstring has the same entropy with or without the dashes. I would think it is so that GUID's can be recognized at a glance instead of going "here's a bunch of hex characters, is that md5.. or perhaps sha1.. no, wait, it could be..." and so on. Also, GUID's are usually not just random bytes. — , Oct 14 '12 at 02:35
http://blogs.msdn.com/b/oldnewthing/archive/2008/06/27/8659071.aspx — Daniel Little, Nov 10 '13 at 23:19
Similar Question from SO [UUID format: 8-4-4-4-12 - Why?](http://stackoverflow.com/q/10687505/1671639) — Praveen, Sep 12 '14 at 10:55
[specific to the latest version](https://stackoverflow.com/q/47230521/1739000) (version 4) — NH., Jan 19 '18 at 22:01

score 12 · Answer 1 · edited Nov 10 '13 at 07:49

From RfC4122 – A Universally Unique IDentifier (UUID) URN Namespace

The formal definition of the UUID string representation is provided by the following ABNF:
UUID                   = time-low "-" time-mid "-"
                         time-high-and-version "-"
                         clock-seq-and-reserved
                         clock-seq-low "-" node

So, those are just the different fields from the original time and MAC-based UUID. The RFC says it originates from the Apollo Network Computing System.

Turnkey · Answer 2 · 2012-10-14T20:08:52.383

2

The text representation with the dashes is separating the four fields of the Guid/UUID into five groups (with the last field being separated itself after the first two bytes): Guid Text Encoding

The representation doesn't have anything to do with security, as there are different methods of computing it and is intended to be a unique identifier not necessarily a secure one.

The most likely reason the fields are split (even though the standard doesn't mention it) is for readability/separation of the component parts.

edited Oct 14 '12 at 20:08

answered Oct 14 '12 at 18:27

Turnkey

1,697
9
10

2

That tells us what the format is, information that was already in the question. It doesn't explain *why*, which is what the OP was asking. – Keith Thompson Oct 14 '12 at 19:49
1

It is just separating them into the fields, likely for better readability and identification. Maybe the last one was split further because of its length. – Turnkey Oct 14 '12 at 20:06
1

logical. Same reason phone numbers, credit card numbers, and many other long numbers are frequently split up in groups when printed or written down. – jwenting Oct 15 '12 at 05:39

Why are UUID / GUID's in the format they are?

2 Answers2