7

I'm working on a project with client that needs to rely on a proprietary data set. They also have custom data that logically fits alongside the proprietary data. What they do is use the same tables but with negative ids. So an Example would be

Table: Animals

Fields: animal_id, name, can_purr

So the organization that provides the data provides Animals

1, Cat, yes
2, Dog, no
3, Mouse, no

And then my client is adding their Frankenmals

-1, Werecat, yes
-2, Werewolf, no
-3, Jackalope, no

So that other FKs of Animal can be reused by Werecat and Jackalope. Every so often, a new animal is recognized by the data maintainer and they add, or something evolves into a new thing so the original keys are important to maintain. The frankenanimal industry all uses this database and can understand me if I talk about Animal 1, but not if I have turned Animal 1 into something other than a Cat that can purr. There is no relationship implied or maintained between -2 and 2.

The negative numbers may be the best way to do it. It makes my skin crawl because then the ID represents something other than just record identification.

Many other systems are changing at this time. So if this isn't a great design, now's the time to change it.

Is there anything so wrong with the negative numbers approach? The main risk I see is if the data maintainer decides to use negative identifiers for their own purposes? An automated refresh could cause havok. But the automation could check for that condition, so, not so bad?

I just can't find much in the way of best practice advice on this. So have you done the negative numbers approach and regretted it? Why? Or have you transformed to a new schema and then regretted that for maintenance or some other reason?

My apologies if this is too subjective. Just not sure how to discover if I'm just being illogical or if the double purpose on the record ID is bad enough to merit a redesign.

  • Are the wereanimals **exactly the same** as the animals, functionally speaking? I.e. can you do everything with a wereanimal that you can do with an animal, in every circumstance? Is the **only** distinction between them that the entities have been created by a secondary source (your client)? Are you sure there are no `if(ID > 0)` checks in your codebase? – Flater May 08 '18 at 07:49
  • Why distinguish with a negative index instead of adding a `kind` column, which is only filled for the custom implementations? – Thomas Junk May 08 '18 at 13:47
  • Of course it's OK. It just won't be industry-standard anymore, that's all. – Robert Harvey May 08 '18 at 17:08
  • @Flater, no, I'm not certain there are no if(id>0) checks. The wereanimals are not exactly the same as the animals, though they are constituted the same. The jackelope may have the same skeleton as the rabbit, but the horns of the antelope. they may do the same things but some wereimals will be able to so things animals can't. Breathe fire maybe. But they are constituted the same. – microsaurus_dex May 08 '18 at 18:07
  • @ThomasJunk the reason for the negative numbers is so that they don't inhabit an id that later gets assigned to a new "natural" animal. – microsaurus_dex May 08 '18 at 18:07
  • Instead of negative IDs, can't you take for instance your object with an ID starting of with a value like 10 000 000 ? If you ever need a new specific set you could then start at 20 000 000, .. it give you enough space and it would be easy to identify native objects from yours. – Walfrat May 15 '18 at 12:19
  • all the answers were so helpful but i have to pick only one that i keep coming back to. this particular industry is developing its standards and has not set aside any customization ranges. I will pursue that with the standards issuer. the other considerations raised are all worth considering, but at this point i'm not sure worth completely refactoring if we can assure there wont be some collision with the official data. – microsaurus_dex Jun 03 '18 at 02:36

4 Answers4

5

From a practical point of view, the only thing which really matters is: is it guaranteed that every code accessing the Animal table will handle those negative IDs correctly? Do you have the code completely under your control? I guess not, since you are talking about an "industry-standard" data model, maybe from some ERP system? Or can you at least investigate the relevant parts of the code to understand how it works? I assume there is a 3rd party vendor involved which provides standard software for this.

So check the documentation of that "industry standard" - if it tells something like "negative IDs can be safely added for customizing purposes", then go ahead. Otherwise, even it it works now, you cannot be sure if the next version of the external vendor's software will collide with your current usage. So if you are not 100% sure negative IDs can be used for what you are doing, you introduce a certain risk of breaking the system this way.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • `I guess not, since you are talking about an "industry-standard" data model, maybe from some ERP system?` Note the difference between having control over the **logic** or the **data**. It's perfectly possible that OP is simmply not allowed to change established ID values, which could be why he's staying out of the positive integer range to prevent conflicts. – Flater May 08 '18 at 12:52
  • 2
    Data models that control IDs that strictly also often designate a range of IDs specifically for local customizations, precisely to reduce the chance of situations like this. Documentation should mention that range. – cHao May 08 '18 at 14:17
  • @Flater: sure. Note I did not suggest to change established ID values. We actually don't know what kind of control the OP has over the system as long as he does not tell us. – Doc Brown May 08 '18 at 14:52
  • @cHao: absolutely. But if the OP did not find any documentation on this, just using a currently unused ID range for their purposes is IMHO pretty risky. – Doc Brown May 08 '18 at 14:56
3

You are working with two data sets. Any attempt to hide this and merge them into one is going to have weird side effects and will most likely lead to problems later on.

Much better to have a separate database with a table with the same columns as the original one with all your frankenimals.

Then, make your application data-set aware and use tuples {dataset, row_id} in your application. This is nice, clean and extensible (as you can add additional data sets). It will also allow you in your application to work with the data sets (e.g. compare results for only-official-dataset, merged-data-set, custom-data-set-1, etc...).

Wilbert
  • 1,703
  • 1
  • 12
  • 16
2

It's not a great idea.

The best way is to have your own id and link that to the external data id.

Lets say you have a string id for your Animal table

MyAnimals
MyId,                                 name
230a33e0-ffa4-47ff-9ccb-b3bcf3a33166, Werecat
89f990d5-88eb-4055-b7fe-787cf75d0461, Werewolf


ExternalAnimalLink
MyId,     ExternalDataSetId, ExternalId
"1",      "aniCorp2018",     1
"2",      "aniCorp2018",     2
"wd_2",   "wildData2001",    2

Downsides of the negative numbers approach.

  • What if the external provider starts using them?
  • What if you need a 3rd provider which also uses numbers?
  • What if the negative numbers are used to represent errors (bad but common)?
  • What if a uint is used at some point or negatives are invalid in some other way?
  • What if the number has additional meaning such as ordering?

But you should also consider the problems caused by using numbers for ids at all.

  • Whats the next available id?
  • Whats the maximum number of animals ever
  • What about test animals, do they get numbers?
  • Does the number match the same animal on all databases?

etc etc

Ewan
  • 70,664
  • 5
  • 76
  • 161
1

Does the documentation for their standard discuss this?

In the financial services industry, it is quite normal to see documented standards that allow for proprietary extensions. Some examples of things I've seen:

  • No facility for extension, but a range of "generic" IDs are provided for your own purposes.
  • ID's below n are reserved, but anything above n is for custom data.
  • Negative numbers are permitted for client-specific fields.

I've seen and used all of the above methods, so using negative numbers could be a viable solution. I think the important thing here is to check the documentation. If the documentation doesn't specify any facility for custom IDs, then it might be worth bringing it up with the standard maintainer/committee as a valid improvement. Ultimately though, if the standard doesn't explicitly say you can use negative numbers, you risk breaking things if other code (e.g. 3rd party applications) don't support negative numbers.

Karl Nicoll
  • 550
  • 1
  • 3
  • 11