14

Often in a table that has no natural key, it is still useful for users to be able to have a uniquely generated identifier. If the table has a surrogate primary key (and in such a case you would certainly expect it to) should that key be exposed to the user or should another field be used for that purpose?

One reason not to expose the surrogate key is that now you can't do operations that preserve the relationship between records, but change the key values, such as certain kinds of deletion/re-insertion, many methods of copying data from one database to another, etc.

The main advantage of exposing the surrogate key is the simplicity of using a field you have anyway.

Under what circumstances is it better to directly expose the surrogate key to users?

psr
  • 12,846
  • 5
  • 39
  • 67
  • Part of this question is that you need to define 'users.' Users could be consumers of an API you expose or fleshy human beings. The answer is not going to be the same for both. – James Snell Jul 12 '13 at 22:05
  • 1
    Fleshy human beings. – psr Jul 16 '13 at 02:11
  • Every situation where the data could be identified differently should have a different identifier. So, if a new system connects with your data, expect it to have its own PK and add that to your data. Visibility to "ugly bags of mostly water" counts as a 'system' for this scenario. My preferred solution is to store the keys and timestamps (add and change and delete) for the entities in a special table. In this way, the data can be easily distributed. –  Nov 06 '17 at 15:10

9 Answers9

11

You need to be ready for any identifier that is exposed to users/customers needing to be changed, and changing the identity of a row in a database and propagating that change to all foreign keys is just asking to break data.

If the data has no natural business key, you can add an additional field for a "business identifier". This should be optimized for the processes it is used for. Telephone keypad entry means numeric only. Over the phone/verbal means avoid similar sounding symbols (B/D, M/N, etc). You can even autogenerate some easily memorable phrase ("green jelly").

The effect of this is that the business can later change how they want to refer to records, and the only data schema change is either adding a new column for that style of id or transform the ids already there. The change doesn't propagate through the entire database, and you still have one id (the surrogate) that is valid over time.

In short, I would avoid exposing surrogate keys to users. As the comments point out, surrogate keys should almost never change. Conversely, businesses want to change everything. If the surrogate key is exposed, it is just a matter of time before the business wants to change it.

As a side note, when I say "exposing" here, I mean to give the key to the user with the expectation that they use it directly (like calling in to support with their order number).

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Chris Pitman
  • 3,426
  • 1
  • 18
  • 21
  • 5
    This answer makes no sense. Surrogate keys never change, and you haven't made a case for not exposing them to users. – Robert Harvey Jul 10 '13 at 23:48
  • 2
    never say never. what if latter design leads to record consolidation? what if you need to upsize and convert away from identities? – DougM Jul 11 '13 at 01:51
  • 3
    @DougM: Then write the records to a new consolidated table with it's own surrogate key, and maintain the original keys in separate fields in the new table (and a field that identifies the original source table) as references, if needed. That kind of consolidation should be extraordinarily rare, and only as a result of a botched design. Repeat after me: *Surrogate. Keys. Never. Change.* That's why they're surrogates. – Robert Harvey Jul 11 '13 at 04:55
  • 3
    @RobertHarvey I agree that surrogate keys should never change, which is why I think they make poor external identifiers. Eventually a customer will want to change how *they* refer to a record. At that point you have either built your entire system around immutable surrogate keys or you thought ahead and put a level of indirection between business identifiers and the surrogate keys. – Chris Pitman Jul 11 '13 at 05:34
  • Well, I never said that you couldn't have other keys, just that they shouldn't participate in joins if you ever expect them to change. – Robert Harvey Jul 11 '13 at 15:18
  • 1
    @RobertHarvey And I do not disagree, the surrogate key should be used for all relationships within the database. A business id value is *only* useful for identifying the record to users/humans external to the system. – Chris Pitman Jul 11 '13 at 16:15
  • wait... your solution to a design flaw is to PERPETUATE THAT FLAW FOREVER? I think your idea of "forever" is far, far shorter than mine. – DougM Jul 11 '13 at 16:17
  • @DougM Who are you addressing? And what design flaw are you referring to? Don't botch the design in the first place, and there won't be a flaw to perpetuate. – Robert Harvey Jul 11 '13 at 16:18
  • you, Robert. designing as if your initial design is perfect and will never change is... well, not something I could support. There is ALWAYS room for improvement, and ANYTHING in the system is subject to change. – DougM Jul 11 '13 at 16:20
  • All I'm saying is that, if you choose surrogate keys for your internal table design (which to me is a self-evident best-practice), you won't have the kinds of problems you describe. Naturally, there are going to be shoddy designs out there... Thankfully, I haven't had to deal with too many of those. – Robert Harvey Jul 11 '13 at 16:22
  • I can see that. but "surrogate keys never change" is not correct, as an absolute statement. "surrogate keys never have to change" may be a better way to state it. – DougM Jul 11 '13 at 16:24
  • If it's subject to change, I assert that it's not really a surrogate key. Perhaps what I really mean by surrogate keys are those keys that *also have the constraint of being the primary key.* (is there any other kind?) – Robert Harvey Jul 11 '13 at 16:26
  • Keys can and do change. Whether surrogate or not it may make perfect sense to change them sometimes. I've certainly changed surrogate key values enough times over the years. A surrogate is quite simply a key that has no meaning in the external universe of discourse. That doesn't mean it mustn't change, just that it's probably less likely to. – nvogel Jul 11 '13 at 16:40
  • @sqlvogel: Am I using the right words? A good surrogate key is system-generated; it is unique, and never reused even if a record is deleted. That's how they work; that's how they are designed to work. To say that they might change doesn't seem to me like a defensible position. – Robert Harvey Jul 11 '13 at 16:45
  • @RobertHarvey. Yes, it's usually system generated and not reused but the key values may need to change if the table is refactored or if data is merged from multiple sources with potentially duplicate keys or to fix data quality issues or simply to effect an unusual data change of some kind. Would you really bother to implement an audit trail for every surrogate key so you can verify that no-one ever changed it? If not then how could you possibly know it has never happened or never will happen? And why would that be a problem anyway? – nvogel Jul 11 '13 at 16:55
  • 1
    @sqlvogel: I can't imagine any scenario where you wouldn't just create a new surrogate key to replace the old one. By definition, a surrogate key is not changeable, so I don't see how anyone could ever change it. By extension, I don't feel the need to create an audit trail for changes that will never occur. – Robert Harvey Jul 11 '13 at 17:04
  • I would upvote this answer if it explained how to address scenarios such as "Order Number" where there *is* no inherent ID other than the surrogate key itself. – Bobson Jul 11 '13 at 18:00
  • @RobertHarvey, by definition a surrogate key is a key that is not used in the business domain outside the database. Nothing about that definition implies that it cannot change. Certainly a common reason that surrogates are used is because they are assumed to offer greater stability than any alternative key but that doesn't prevent you from changing them if you find a need to. Stability is relative. Saying "never" is just dogma and shouldn't play any part in a practical strategy for solving real problems. – nvogel Jul 11 '13 at 18:37
  • @Bobson That is inherently a business/usability decision. Calling an order "red skeleton" or "soft moss" might be endearing in one context, but inappropriate for a debt collection agency. Like I mentioned in the answer, it depends on usage and context. – Chris Pitman Jul 11 '13 at 18:39
  • @sqlvogel: Is that your only objection? That I unequivocally stated "never?" That is the assurance that I expect when I use a system-generated key; I expect that value to never change, *unless someone borked the design.* GUIDs are supposed to be *always* unique, but everyone knows that there's an infinitesimal chance that you'll get the same one twice, or a cosmic ray will strike your hard drive, or whatever. – Robert Harvey Jul 11 '13 at 18:39
  • @RobertHarvey, of course a surrogate is supposed to be unique otherwise it wouldn't be a key at all. The fact that it is unique doesn't mean you can't change its values though - as long as it remains unique. GUIDs aren't the only sort of surrogate key. There are a number of different strategies used for generating surrogates. If you decide for good reasons to change your key generation algorithm then that might be yet another good reason why you might want to update an existing surrogate key value. – nvogel Jul 11 '13 at 19:05
  • 2
    @sqlvogel: Would it make things clearer if I used the term "system-generated primary key" instead of "surrogate?" I do agree that you might want to change the algorithm that generates the surrogate keys, but that doesn't change their fundamentally immutable quality. – Robert Harvey Jul 11 '13 at 19:11
  • @RobertHarvey, no problem with the idea of it being "system-generated". You still might want to update it though or generate new values to replace old ones. See my previous answer. – nvogel Jul 11 '13 at 19:12
  • 1
    The whole point is that you should never, ever change a PK. Change your key generation algorithm that's not a problem but it should only ever affect newly created keys. – James Snell Jul 12 '13 at 21:52
4

In some cases, surrogate keys are expected and make sense to users. My favorite example is "order number". Order number isn't really a natural key: a natural key might be timestamp plus user, or maybe more than that if you expect users to generate more than one order within the granularity of your timestamp.

Nonetheless, users understand and expect the convenience of an order number. There is no harm, and lots of value, if you let users know about them.

On the other hand, some surrogate keys make no sense to a user. Sure, my health insurance company has some surrogate key that identifies me based on my member id, date of birth, carrier, etc, but I don't care about that, I care about the info on my card (which often includes ids based on my employer and are not unique across the universe... Hence the surrogate key at the insurance company).

Alan Shutko
  • 1,380
  • 1
  • 10
  • 9
  • If a user understands it and expects to interact with it, then it's no longer a surrogate key. Surrogate keys are defined as having no business meaning. Just because it's an ID number doesn't necessarily mean it's a surrogate. Also, "some surrogate key that identifies me based on my member id, date of birth [...]" doesn't make sense. You're saying that their surrogate key (which has no business meaning) identifies you based on things that could compose a natural key? – Kyle McVay Mar 15 '17 at 22:12
  • Often, what happens is that you are given an employee id number so that in the employee system you are differentiated from other people with the same name. On your insurance card, it may list your company and employee id. But the health insurance company will often generate its own internal id that it can use across all of its systems instead of using the name, date of birth, and employee ids that it receives from your employer. Are these generated keys surrogate or natural keys? Depends on who you ask (check 2 alternate definitions at https://en.wikipedia.org/wiki/Surrogate_key) – Alan Shutko Mar 15 '17 at 22:50
2

you should ONLY expose a field to a user that provides useful information to the user, either directly or in reporting defects to you.

conversely, you should ALWAYS expose "surrogate primary keys" when they are the principal means of identifying a record (simple or complex) for an interaction the user performs.

DougM
  • 6,361
  • 1
  • 17
  • 34
2

You should only expose a surrogate key if it's a properly generated GUID/UUID*. Exposing sequential surrogate keys is number 4 on the OWASP Top 10 security issues.

* In practice, it's best to assume that it wasn't properly generated for these purposes unless you know that it was created by a cryptographically secure random or pseudo-random number generator.

Peter Taylor
  • 4,012
  • 1
  • 24
  • 29
2

If a table has no natural key, surrogate keys allow rows like this.

surrogate_key  some_name
--
1              Wibble
2              Wibble
...
17             Wibble
...
235            Wibble

I'd call these artificial keys instead of surrogate keys, but that distinction isn't important for this question.

Now, assuming that there's important data referencing these surrogate keys through foreign keys, how would the end users know which row to update if they don't know the surrogate key values?

  • It's a little contradictory to label the column "surrogate_key" but then say that they're "artificial keys instead of surrogate keys". If you're calling these artificial keys because they have no business meaning and only serve as unique identifiers, then this key should be naturalized (exposed), and a new surrogate key should be created. i.e., rename the naturalized surrogate_key to something with business meaning, expose it to the clients, and create a new similar column to be the true surrogate for internal operations. Less than ideal, but still better than exposing a true surrogate. – Kyle McVay Mar 15 '17 at 15:00
  • @KyleMcVay: It's not contradictory. It honors the meaning of *surrogate*, the essential idea of which is "to substitute for". A *surrogate key* substitutes for a natural key. (Codd and relational theory in general uses *surrogate key* in this sense.) A table that *has* no natural key can't *have* a surrogate key in that sense. But it can have an arbitrary value that's used *instead* of anything else. That's what I'd call an artificial key. – Mike Sherrill 'Cat Recall' Mar 15 '17 at 15:35
  • Your definitions aren't inaccurate, but I think _surrogate key_ has taken on extra, industry-specific meaning beyond the word _surrogate_. If you're going to expose a key to a client, I would argue that that key now has business meaning; it has been _naturalized_, and should now be considered to be logically part of the entity it represents. By this logic, there's no such thing as an exposed surrogate key. If you're going to expose an artificial key, it's no longer a surrogate key and you need a new one. – Kyle McVay Mar 15 '17 at 18:42
1

In layman's words:

  • Surrogates should be hidden from the user.
  • You should expose some other business candidate key to the user.
  • If no other candidate key exist you should show the PK. But in this case the PK is not considered a surrogate since it's not a substitute for other column.
Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • Why must the PK be shown? I think that is my question - whether an auto generated PK is properly called a surrogate key is tangential. – psr Jul 10 '13 at 22:55
  • 1
    @psr Your question title says "*surrogate* keys ever be exposed". I said no. But you have to show some other key. If no other key exist then you must show the only key you have. Tangentially I clarify that in those cases the key is not really a surrogate because it's not a substitute for any other column. – Tulains Córdova Jul 10 '13 at 23:03
  • The question is whether you should add a column, or show the key you have. – psr Jul 10 '13 at 23:03
  • @psr I re-phrased my answer to make it more clear. – Tulains Córdova Jul 10 '13 at 23:06
  • I don't understand your third bullet. A Primary Key can also be a surrogate key. – Robert Harvey Jul 10 '13 at 23:49
  • @RobertHarvey Of course, but when no other candidate key exist besides the artificial key, it's no longer called surrogate. An artificial key is a surrogate only when they are created to serve as PK when a natural candidate already exist. – Tulains Córdova Jul 11 '13 at 00:38
  • 1
    I find that to be a materially insignificant distinction. If your rule is that every table has an artificial key (which mine is), and that only the artificial keys participate in joins (which is my principle as well), then the notion that a key is a surrogate because other keys might possibly be available is inconsequential. – Robert Harvey Jul 11 '13 at 00:41
  • @RobertHarvey No, I don't think all PK must be surrogate. I think some tables will have surrogates because no good natural key exist. Others will have natural keys. But I know some designers add a surrogate no matter what. – Tulains Córdova Jul 11 '13 at 00:43
1

It shouldn't matter whether you expose the keys or not to the end user. Your application should perform the necessary authorization such that simply knowing an order id, for example, can't allow them access to something they normally wouldn't have access to.

caveat: this assumes a web based or n-tier application where server side authorization is possible/feasable. If you have a VB app thats directly executing sql, thats a whole 'nother issue.

GrandmasterB
  • 37,990
  • 7
  • 78
  • 131
  • What about not be able to insert then delete records, while preserving foreign key relationships, as mentioned in the question? – psr Jul 12 '13 at 21:33
  • What does that have to do with exposing the key to the user? Maybe I dont understand your use of the term 'exposing'. I'm working from the assumption you mean 'able to be seen by the user'. – GrandmasterB Jul 12 '13 at 22:04
0

According to the principle of encapsulation, which is a foundational concept in OOP, you should hide implementation details. Once a surrogate becomes public it becomes data. One man's surrogate key is the next man's natural key.

In practice I think surrogate keys should be encapsulated at the boundary of a service layer. If you have a service for some aggregate root then the internal domain objects and repository would all use the surrogate key. These internals would all be encapsulated. The public API would use some other format which could include nesting to hide parent keys, so hiding the surrogate key of the root record isn't exactly a seven year leap.

0

One health-care company that I did a project for had a nasty problem with their "provider IDs." These were originally hand-managed and they contained embedded information. A single provider might have more than one ID, and had to know the "right" one to use at each clinic. And, unfortunately, some provider-IDs had been assigned to more than one provider at a time!

To solve the problem and to allow several different systems to communicate with each other unambiguously, I created an internal-only system of surrogate keys: they were simply random strings of letters that were the same length as the old (numeric) provider-ID strings. This was done so that column-sizes and data types didn't have to be changed.

One master system had to take a provider-ID, "figure out who it meant," and return the proper surrogate key, which always corresponded 1:1 to a single person.

Sometimes, that same system had to go the other way ... "which (legacy) provider-ID should we give them when referring to [this_person_surrogate] in [that_system]?"

Each of the downstream systems was now re-coded to use these unambiguous surrogate keys to refer to persons, knowing that every other system would be using only the same surrogate value.

Within each system's database, auto-increment integers were conveniently used as primary and foreign keys, and these keys were of course never shared with any other system. The record for each "person" contained its surrogate.

Any request that one system made to another system always used the surrogate. No system ever knew what any other system's "primary keys" were.

Carrying this concept one step further, surrogate keys were never published externally. The company eventually re-vamped its provider-ID system in an effort to clean up their mess, but when they did so we created a new provider-id "for public consumption," and tied those values to the surrogates just as we had done with the old IDs. Thus, the "purity" of the surrogates was not compromised, and the company didn't have to give its providers "ugly character strings."

Mike Robinson
  • 1,765
  • 4
  • 10