Featured image of post Danish CPR-nr is NOT a primary key!

Danish CPR-nr is NOT a primary key!

Personal Identification Number is a misnomer

Happy pride month 💖

I thought I would write something vaguely related to LGBTQ+ struggles to be a bit on theme this month, so here is an example of how engineering hubris in designing systems can lead to some leaky abstractions.

Specifically, I think a problem that often occurs is that as engineers (or anyone making decisions that may affect many people), we often lack the humility to realize that it is beyond us to encapsulate the wide variety and diversity of the human experience[1]. Instead of seeking to try and create representations of reality that are isomorphic to it, we should simply seek to create systems that are invariant over reality. Rather than seeking to capture it, we should merely refer to it as best we can.

Not doing so is a fallacy under many names, be it reification, concretism, or hypostatization, and the assumption that the map is the territory is a surefire way to get lost in our abstractions.

What is a CPR-nr

For context for people not aware of how Denmark works, CPR stands for personal identification number, and is the national identification number. It’s stored in the Civil Registration System (“det centrale personregister”). The number takes the following format:

1
DDMMYY-SSSS

The number is ten digits long, and is based on date of birth, as well as a “sequence number”. The first sequence digit (digit 7) indicates the century of birth, translated via a lookup table, and the last sequence digit (digit 10) indicates binary gender (odd for male, even for female).

#Year (5. & 6. digit)
Digit 700-3637-5758-99
0-31900-1999
42000-20361937-1999
5-82000-20571858-1899
92000-20361937-1999

For instance, if Alice is born in Jun 3, 1996, then her CPR-nr could for instance be 030696-2132. If Bob is born in Jul 7, 1877, then his CPR-nr could be 070777-7777, but it could also be 070777-7893, or 070777-6789.

Interfacing with reality

Of course for starters, we may realize that this model, originally from 1968, simply doesn’t work, for several reasons.

In contemporary Denmark, we recognize nonbinary gender identities legally, which comes in effect in various ways, such as by allowing the gender marker “x” in passports.

However, when it comes to the anachronistic CPR system, there simply isn’t a way to fully represent gender with the 10’th sequence number, after all, nonbinary is — as the name suggests — outside of the male/female binary. The way it currently is dealt with is by simply keeping the serial number of the assigned gender at birth of nonbinary people. This way, the 10’th serial digit fails to accurately represent the gender of all members of the population — it isn’t closed under gender, if you will.

It however gets much worse when it comes to changing the CPR-nr. For instance, in Denmark, we allow people to legally change their gender. When doing this between male/female, you will receive a new CPR-nr, with a sequence number that corresponds to your gender identity. Another example is that there can be errors in the CPR-nr from faulty registrations, or the number may be abused by cyber criminals, in which case, getting assigned a new CPR-nr is also possible.

While there is, internally in some database at the central person registry kept a record containing all CPR-nr’s a person has had through their lifetime — for reasons like debt, liability and taxation — the vast majority of downstream users of the CPR system do not have such an abstraction.

What most companies do, whether it be Tech, Banking, Phone services, Postal services etc., is they store the persons data with their CPR number, as the primary key, under the assumption that it’s immutable. This leads to extensive problems for the people who change their CPR-nr, as they will now experience the majority of services they interact with and rely on for their daily life breaking down in unique and interesting ways. Further, contacting the various service providers, you’ll often find them perplexed and unprepared to solve these issues, and it will likely require long chains of emails until you eventually, finally, just reach some database administrator that copy-paste your old record from your former primary key (cpr-nr) to your new primary key (cpr-nr). And even then, you’ll likely still find various fun ways the system breaks down after the fact, that will then again require a new long email chain. And this is a problem you have to deal with for every single service.

For instance, I personally went without being able to contact my bank, withdraw or deposit money, or even pay bills for… two and a half months. And there was no clear indication of progress from the bank as to when this would be fixed, as they clearly didn’t really understand what was actually going wrong.

Security and confidentiality

Aside from the problems mentioned above, one may quickly realize that if you know the birth date and gender identity of someone, the search space for finding someones CPR number gets drastically reduced. The PIN tier strength of a 4 digit unique number that many of us assume the sequence number represents is actually wrong, considering the 4’th sequence digit is cut in half by gender, and the 1’st may — e.g. in the instance of someone born between $2000-2036$ — simply be a guess between $4$ and $9$.

It should be obvious how the CPR-nr may fail to be irreducible, but aside from that, encoding these kinds of data into identification not only weakens its obscurity, making it easier to guess, but also makes it sensitive information, containing personally sensitive data.

Solution

No… making CPR-nr’s immutable is not a solution!

One possible solution may be to drop this old CPR system, and exchange it for one in which the number does not contain any personal information and is sure to never change. But between where we are and that place is a long, bureaucratic process. And besides, you shouldn’t really rely on external information for your primary keys, you should always wrap anything you’re not creating yourself in a primary key you control. And needing confidential information as your primary key is also a horrible idea, meaning you will need to ensure much stronger guarantees when looking up any user data.

So the solution to this problem is simple. Don’t use any personal information as a primary key, ever! Understanding if any information encoded in your key are immutable, irreducible, or problematically identifying of the person you’re storing information about is near impossible, specially if you’re not a lawyer, and besides, any of those properties may change as laws and governance changes. Instead, create your own primary key, then you’ll know it’s sure to never change.

It all hearken back to the fundamental theorem of software engineering:

“We can solve any problem by introducing an extra level of indirection.”

David J. Wheeler

Postscript

Also, we can deduce that certain online logins makes usernames their primary key, which is… sigh. Even if you then allow people to change their nickname, it’s just such an inherently rigid and annoying way to design the system, and now you’re out here keeping track of multiple usernames anyways, that’s so ugly! And if you’re doing nicknames anyways, why not just use a required field like email for login instead of username, and just keep a nick or list of nicks? Whatever you do, just make a darn primary key that is like not related to anything else, it’s so much easier.

Footnotes

[1]: For an example of the vile hubris of the engineer, see class.gender.php.