2

We are restructuring our entire database / filesystem / user identity system. As a first step, we have determined that we need to assign each user/employee in an organization unique ID. Surprisingly enough, there does not seem to be theoretical resources regarding such a problem.

I wonder whether there are some recommendations for designing such a system. I have studied some ID schemes, but none of them seem to be practical in this case. In particular, UID or ISBN like systems are impractical because the codes are very long for people to remember or communicate. I have looked at history of CODEN system for assigning journal IDs and it is very inspiring, but I would prefer to avoid the problems they historically went through (changing system twice along the way).

Desired properties of the system

In my case, I have around 10.000 people. The system should (probably) have these properties:

  • Uniqueness of IDS
  • IDs should be easily communicated and remembered (i.e. not too long etc)
  • Optionally the system should catch common mistakes in ID if not correct it.

I have considered

I considered including some sort of initials from names and brief analysis shows the following:

  • By using initials (1 character from Given name, 1 from Surname) I split the people into groups with the largest one having 165 members (J.K.)
  • By taking 1,2 first characters (1 from Given name, 2 from Surname) I get the largest group of 49 members.
  • If I take 1,3 first character that I get the largest group of 18, which is better than taking 2,2 first characters from names where I get 39 people in the largest group.

I also consider adding a checksum character like in CODEN which would preferably not only avoid mistakes but also make automatic correction possible in most cases.

I also had a look at Plus Codes which has a great idea of NOT using some characters (like 0, I, etc) which can be easily mistaken for others. But this would collide with the intention to include initials of some sort.

Regarding the "catching errors" I have found an article about Check digit which also suggests Damm algorithm which however only concerns the case if the number codes are used. I might be able to construct a similar system for letters thought.

PS: I have searched SE sites and initially asked this question on SuperUser, but it has been rejected as off-topic. I am trying to find the right place to ask this, but it seems not obvious.

Adam Miklosi
  • 135
  • 1
  • 2
  • 7
gorn
  • 139
  • 5
  • Do the employees already have something that should be unique to them, such as an email address? – Matthew Sep 27 '18 at 18:59
  • no, there is not central login names or emails. In fact the company is very nonhomogenous and some of the people will never need company email. – gorn Sep 30 '18 at 21:32

2 Answers2

1

It's only code, doesn't have to mean anything, i.e. no information should be encoded in the code. Since there's no length limitation, why not use words? This is not my original idea, btw, I got it from what3words

It meets the desired properties:

  • Uniqueness, what3words can map earth in 3x3 square meters space. Even if that's the maximum, you have plenty of IDs available.
  • Easily communicated and remembered, three.words.easy.
  • System can catch mistakes, just need dictionary lookup, autocomplete, etc.

I know that sounds like a joke, but it does meet the requirements and I can't find any reason to not use it.

imel96
  • 3,488
  • 1
  • 18
  • 28
  • nurses.wage.sulk, examiner.yourself.minimally not great ids for people – Ewan Sep 25 '18 at 15:42
  • @Ewan what makes IDs great? The only ID I really like is 007 – imel96 Sep 25 '18 at 16:33
  • why cant we pick our own colours?! – Ewan Sep 25 '18 at 17:00
  • Depending on the organisation, maybe people can pick their own human friendly IDs and then there's a service that translate that into cryptic IDs (if random character string is preferred). But that sounds too much like domain names in the internet. – imel96 Sep 26 '18 at 03:32
  • Interesting idea. I have looked at the what3words specs and the trick is that the three words are automatically calculated from the GPS coordinates. How would you calculate them in this case? In this case I want to avoid people picking individual IDs. Indeed I could make some registration system where they do so, but it is quite different from what we expect to have. – gorn Sep 27 '18 at 18:54
  • @gorn It's practically assigning a hash to some words in dictionary. E.g. if you take the first 6 digits from md5sum of "Alice Cooper", you'd get "998183". You can then use that to get the 99th, 81st and 83rd words from the dictionary, or the 998th and 183rd words for two words. Just make sure the employee data you hash are unique, i.e. you might want to add date of birth to hash. – imel96 Sep 28 '18 at 05:39
0

First of all I have to say, use a GUID and do your 'easy to communicate' requirement with a 2d barcode or mag stripe or near field communication or autocomplete fields or something.

Secondly, You have such a small number of people, why don't use just use their Name or an int or a random 5 character string?

Each has downsides but none are unsolvable. I would go for a random 5 characters from a subset of letters and numbers, omitting o,1,l etc. generate batches in advance and have a human check each for obscenities.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • Names are generally not a good idea as IDs because two people can have the same name, one person can have multiple names, and a person can change their name. Also, this: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ and https://shinesolutions.com/2018/01/08/falsehoods-programmers-believe-about-names-with-examples/ – Jörg W Mittag Sep 25 '18 at 12:25
  • @jorg ever heard of email addresses? – Ewan Sep 25 '18 at 13:04
  • I agree with the fact, that including the name is usually not good, however it would substantially shorten the string which in dividual person needs to remember. – gorn Sep 27 '18 at 18:51
  • 5 random charcters is something noone will ever remember :) – gorn Sep 27 '18 at 18:51
  • aec65ffb-fb63-4a9f-9566-faa69c70cff8@someplace.com should cover all bases here no ? – Newtopian Sep 27 '18 at 18:52
  • Yes, but noone will remember it :) – gorn Sep 27 '18 at 19:00