7 Strategies for Assigning Ids

Identities are the defining characteristic of an entity in Domain-Driven Design. And as soon as the Id is public and leaves its immediate context, other components might use it. For example if service A references by Id an entity from service B, changing the Id of the entity will have a knock-on effect of service A. This is why its important to have several tools in the toolbox. In this blog post we’ll discuss 7 strategies for assigning Ids and their trade-offs.

GUIDs (UUIDs)

This is the simplest and most straightforward option: you just generate a GUID and use it as an identifier.

Pros

Easy and fast to generate.
Guaranteed to be unique (OK, not guaranteed, but the probability of generating duplicate GUIDs is so low it doesn’t make sense to debate it). This also makes it easier if you want to merge data from two different sources.
There is no central authority needed to generate a GUID. This means that GUIDs can be generated on two different machines, which makes it easier to use in distributed systems. You might even use client-side generated Ids.
They don’t leak any business intelligence information (more on this below).

Cons

They are not human readable. GUIDs are 36 characters long with the hyphens, 32 without, and 22 if you encode it.
They are big (16 bytes).
GUIDs might hurt Database performance (although opinions vary on how much and if it matters). This was not a problem in the systems that I have worked on. Also, there is the option of generating sequential GUIDs.

Sequential Integers

This option usually implies relying on a database server to generate the Id for you.

Pros

Easy to generate as all RDBMs have this option.
Human readable and easy to remember.
Small (4 bytes).

Cons

Since you need a central authority to generate the Ids (usually the Database server), you have a single point of failure and a potential bottleneck. This might hurt salability after a certain point.
If it’s exposed publicly (displayed on a page or part of an URL), it might leak business intelligence data. For example, if I order something now and I get an order with Id 345 and I order something after one month and I get an order with Id 445, then I can infer that the shop is getting about 100 of orders per month.
You need a round trip to the Database to get the Id.

Randomized Integers

This approach is based on the one above. You do generate a sequential Id, but keep it for internal use only. For external use, you symmetrically encrypt it using Skip32. This will generate an integer that will seem random.

Pros

Human readable and easy to remember.
Small (4 bytes)
Doesn’t leak any business intelligence information.

Cons

Since you need a central authority to generate the Ids (usually the Database server), you have a single point of failure and a potential bottleneck. This might hurt salability after a certain point.
You need two round trips to the Database: one to generate the sequential Id and another one to save the encrypted Id.

Short random Identifiers

In this approach you generate a short but random identifier and then check that it’s unique. This is the approach used by URL shorteners like bitly. You can generate it in many ways, like using random over base 62 characters or hashing, base 62 and substring.

Pros

Short (5 characters will give you approximately 1 billion unique entries).
Human readable.
Does not leak any business intelligence information.

Cons

Since the Id is not guaranteed to be unique, you must check for collisions in the Database and retry in case the Id is already there. This could be implemented easily using a unique constraint on the Id column in the Database and a retry. This approach could break if your data is split in more tables (for example if you archive old entries in a different table).

Natural Keys

You might work with an entity that already has a unique identity. For example all books should be uniquely identified by an ISBN. These types of keys are also known as natural keys.

Pros

The identity is well known in the problem domain.

Cons

You must double check that it is actually a natural key and it is unique. There are cases of two different persons having the same Social Security Number. Changing the Id can be quite painful.

User Input

In this strategy, you are relying on the user to provide the Identity. The most common example are blog posts. If you look at this blog post’s URL, it has the following slug in it: 7-strategies-for-assigning-ids. This is derived from the title and could be used as the unique Id of this blog post. If there are lots of blog posts created and the chance of collision is high, you could append a hash to make it unique (example: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439)

Pros

The identity assignment part is simpler: you just use the value from a user input as your identity.
The identity can provides hints about the content of the entity.

Cons

Identities should be stable, but what if the users want to change it? What’s the cost of change? For example, if the user wants to rename the blog post? Vaughn Vernon, in his Implementing Domain-Driven Design book suggests workflow-based identity approval processes for low-throughput domain. This way you could minimize the chance of misspelling an Id.
You need to check the uniqueness of the Identity in the database.

Externally owned Identity

If you’re integrating with a 3rd party you could choose to reuse the Identity that they assign. This doesn’t necessarily need to be external to your company, but external to the service.

Pros

Easy to do as it requires only an assignment.

Cons

You need to ensure that the External Identity is stable. If it’s not, then when it changes it will impact your system too. For example, there are systems that regenerate their Ids during disaster recovery. This means that if you restore the external system and your system, they will be out of sync.

Conclusion

So, which one should you use? It depends on the context, of course. I find myself using GUIDs most of the time. If it needs to be human readable, then I use short random strings.

What approach do you use most often and why? Have you used other Identity generation strategies that are not on this list?

Simple Oriented Architecture

GUIDs (UUIDs)

Pros

Cons

Sequential Integers

Pros

Cons

Randomized Integers

Pros

Cons

Short random Identifiers

Pros

Cons

Natural Keys

Pros

Cons

User Input

Pros

Cons

Externally owned Identity

Pros

Cons

Conclusion

Related

Victor Chircu

GUIDs (UUIDs)

Pros

Cons

Sequential Integers

Pros

Cons

Randomized Integers

Pros

Cons

Short random Identifiers

Pros

Cons

Natural Keys

Pros

Cons

User Input

Pros

Cons

Externally owned Identity

Pros

Cons

Conclusion

Related

Share this:

Victor Chircu

Related Posts

Interviewing your architecture

Handling Failure in Long Running Processes

Defining Test Boundaries – An example