Domain Objects as ids create some complex/subtle problems:
Serialization/Deserialization
If you store objects as keys it will make serializing the object graph extremely complicated. You will get stackoverflow
errors when doing a naive serialization to JSON or XML because of the recursion. You will then have to write a custom serializer that converts the actual objects to use their ids instead of serializing the object instance and creating the recursion.
Pass in objects for type safety but only store ids, then you can have an accessor method that lazy loads the related entity when it is called. Second level caching will take care of subsequent calls.
Subtle reference leaks:
If you use domain objects in constructors like you have there you will create circular references that will be very difficult to allow memory to be reclaimed for objects not being actively used.
Ideal Situation:
Opaque ids vs int/long:
An id
should be a completely opaque identifier that carries no information about what it identifies. But it should offer some verification that it is a valid identifier in its system.
Raw types break this:
int
,long
and String
are the most commonly used raw types for identifiers in RDBMS system. There is a long history of practical reasons that date back decades and they all are compromises that either fit into saving space
or saving time
or both.
Sequential ids are the worst offenders:
When you use a sequential id you are packing temporal semantic information into the id by default. Which is not bad until it is used. When people start writing business logic that sorts or filters on the semantic quality of the id, then they are setting up a world of pain for future maintainers.
String
fields are problematic because naive designers will pack information into the contents, usually temporal semantics as well.
These make it is impossible to create a distributed data system as well, because 12437379123
is not unique globally. The chances that another node in a distributed system will create a record with the same number is pretty much guaranteed when you get enough data in a system.
Then hacks begin to work around it and the entire thing devolves into a pile of steaming mess.
Ignoring huge distributed systems ( clusters ) it becomes a complete nightmare when you start trying to share the data with other systems as well. Especially when the other system is not under your control.
You end up with the exact same problem, how to make your id globally unique.
UUID was created and standardized for a reason:
UUID
can suffer from all the problems listed above depending on which Version
you use.
Version 1
uses a MAC address and time to create a unique id. This is bad because it carries semantic information about location and time. That is not in itself a problem, it is when naive developers start relying on that information for business logic. This also leaks information which could be exploited in any intrusion attempts.
Version 2
uses a users UID
or GID
and domian UID
or GUI
in place of the time from Version 1
this is just as bad as Version 1
for data leakage and risking this information to be used in business logic.
Version 3
is similar but replaces the MAC address and time with a MD5
hash of some array of byte[]
from something that definitely has semantic meaning. There is no data leakage to worry about, the byte[]
can not be recovered from the UUID
. This gives you a good way to deterministically create UUID
instances form and external key of some sort.
Version 4
is based only on random numbers which is a good solution, it carries absolutely no semantic information, but it is not deterministically re-creatable.
Version 5
is just like Version 4
but uses sha1
instead of md5
.
Domain Keys and Transactional Data Keys
My preference for domain object ids, is to use Version 5
or Version 3
if restricted from using Version 5
for some technical reason.
Version 3
is great for transaction data that might be spread across many machines.
Unless you are constrained by space use a UUID:
They are guaranteed unique, dumping data from one database and reloading into another you never had to worry about duplicate ids that actually reference different domain data.
Version 3,4,5
are completely opaque and that is they way the should be.
You can have a single column as the primary key with a UUID
and then you can have compound unique indexes for what would have been a natural composite primary key.
Storage does not have to be CHAR(36)
either. You can store the UUID
in a native byte/bit/number field for a given database as long as it is still indexable.
Legacy
If you have raw types and can not change them, you can still abstract them away in your code.
Using a Version 3/5
of UUID
you can pass in the Class.getName()
+ String.valueOf(int)
as a byte[]
and have a opaque reference key that is recreatable and deterministic.