You don’t need auto-incremented primary keys for your Entities anymore

Posted by

Everyone knows that primary keys need for identifying rows (subjects) in database tables and for tables linking.

And databases are indifferent to what thees keys are in their nature.

But circumstances are not.

So why do you still use artificial auto-incremented primary keys?

How to select a Primary Key?

Analyzing the database table developers examine elements on being eligible for becoming a Primary Key. They compose Candidate Keys list checking its items on applicability by following rules.

Elements of a Primary Key

  • It cannot be a multipart field.
  • It must contain unique values.
  • It cannot contain null values.
  • Its value cannot cause a breach of the organization’s security or privacy rules.
  • Its value is not optional in whole or in part.
  • It comprises a minimum number of fields necessary to define uniqueness.
  • Its values must uniquely and exclusively identify each record in the table.
  • Its value must exclusively identify the value of each field within a given record.
  • Its value can be modified only in rare or extreme cases.

Often developers prefer artificial (surrogate) auto-incremented key to be the primary key. They are simple, easier to maintain and create. All modern databases support them out of the box. Most of times Primary Keys are Long values.

Primary Key selection

What’s wrong with Artificial auto-incremented Primary Keys?

Read a list of responsibilities for being Primary Key above again.

One requirement make things. Values of primary key cannot cause a breach of the organization’s security or privacy rules.

Putting primary key values into the REST API as entity identifier opens the door to investigate the count of your entities. This helps to measure your business – isn’t it a breach in privacy? Absolutely yes.

Putting primary key values into descriptions of exceptions and error messages indirectly leads to measurement of your entities. Isn’t it a breach in privacy? Somebody may say no, but what a strange system that operate in errors with IDs which is not applicable to API.

Putting primary key values into log messages may lead to same measurements if detractors can access to logging system. Isn’t it a breach in privacy? Seems, no. But take a fact that the clients of your API may ask a help saying IDs which is not contained into log messages.

More over, in distributed systems auto-incremented primary key is a bad practice. It works until the database need to be scaled.

What is the alternative?

Using UUID as a unique primary key is the alternative to auto-incremented primary key.

According to Domain Driven Design practices every entity must be maintained in one application/service responsible to it.

Therefore no need to make the database be responsible to generate such key values themselves. It can be generated in code of an application. More over, this helps to design good distributed system.

Why do you still need an auto-incremented Primary Key?

Picture the whole path of IDs in modern cloud-ready system.

Endpoints of API take an ID, gather some kind of information from one neighbor application, taking another IDs. These IDs enrich with entities from another neighbors. Every such changing an ID to entity leads to growth of internal communications. All of involved applications write log messages. Eventually the appropriate response is given to the client. Incoming and outgoing IDs take a path through all layers of the system.

So why don’t you migrate from Long-as-ID to UUID-as-ID?

The answer is: merely due to human readability. Anybody reads and remembers numbers much easier then random string like 32- or even 64-symbols in a length.

So who interacts with Primary Keys?

Developers of external API – the first actors. They don’t need our internal counting primary keys values.

Our developers – they write the code and rarely analyze the datum.

ML engineers – they interact with abstract datum, they don’t need our counting primary keys values.

Our “non-developing” colleagues – they don’t need in primary keys at all, they need human readable values placed in reports.

Support team. Helping the customers – here is the activity which implies converting error messages into answering to the question “why?”. We refuse putting our internal auto-incremented primary keys values into error messages, do you remember? Therefore, all questions will contain UUIDs only. Searching by UUIDs and converting UUIDs into entities (database subjects) can become an issue. Because remembering Long values much easier then long random Strings, remember?

How do we help our support team?

Teach them ‘like-conditions’ for database query composing.

Help them by supplying a technical possibility to use long Strings as IDs automatically. Don’t make them copy-and-paste such IDs. Don’t make them remember-and-rewrite such IDs.

And get rid of auto-incremented Primary Keys

Just do it.

Leave a Reply

Your email address will not be published. Required fields are marked *