6

I've done some searches for "id mapping" and "external id mapping" and I can't find anything meaningful on the site or google, so, here's my question/scenario.

I'm developing an application which will match some records from our internal system with the "same" records from an external (outside the company) system. The data-exchange (receive the data from external system) is irrelevant (could be TXT, CSV, XML, Excel, etc...).

At some point I'll have to match our internal customer id with the external customer id. The thing is: I know my id is immutable, but I don't know id the external id is mutable. And there's no way to know, because the external system is out of my control and can be changed without my knowledge anytime.

So I have 2 scenarios:

  • Best case: the external ID is immutable;
  • Worst case: the external ID is mutable (within the constraints below);

So I'm developing a mapping table with the following attributes:

  • Internal System Customer ID;
  • External System Customer ID;

Pretty standard stuff.

The real question is: What should I add on this table in order to prepare for worst-case scenario ?

Constraints:

  • If the ID is mutable, it'll be on a daily basis, so there can't be a 1-to-many internal-to-external ID on the same day (this is a business rule that can be enforced);
  • The external ID can be anything from an integer to a GUID or a zip-code. But nothing bigger than a GUID;
Aziz Shaikh
  • 427
  • 4
  • 10
Machado
  • 4,090
  • 3
  • 25
  • 37

2 Answers2

5

In order to do the mapping, you need something which uniquely identifies the object in the external system. This is exactly the purpose of an ID.

If the external system uses mutable IDs, the system is broken. This includes the case where an ID of a removed entity is reused for a new entity.

In order to have a hint that the system is broken, you can compare several properties of the entity stored locally with the information of the external system. For example, for a customer, one can match the first and last name, the e-mail address or the birth date. While a mismatch in one of those fields doesn't mean anything (for example, the customer may change his name, or the original birth date may be incorrect and corrected later), the fact that several of them changed is a good indication of the ID recycling.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • I understand what you are talking about (the system is broken if an ID changes), but maybe it's my fault for poor writing skills: What if the external system changes (from one vendor to another, where one uses autoincremented integers as ID and another uses GUID's as ID) and the new system vendor doesn't have the will to provide the previous ID ? This is a possible real-case scenario that my business has to live with. And I just receive the ID, so I can't compare other properties. – Machado Jan 03 '14 at 16:08
  • 1
    @Machado - In the worst case you are out of luck. If you need to find someone knowing only their social security number, and that number changed, then you can't find them. Unless there are other options available that didn't make it into your question or comments. It sounds like it's time to have the business consider that sometimes the ID will be lost, and decide how they want to deal with that. – psr Jan 03 '14 at 19:21
  • 2
    @Machado: *"I just receive the ID, so I can't compare other properties."*: then obviously, there's nothing you can do. If there were additional data available, then you could have found a link using the technique I described in the third paragraph of my answer. – Arseni Mourzenko Jan 03 '14 at 19:24
1

The best practice is the external ID must be immutable.

Let's say your matching on employee ID only. Certainly taking just the employee ID may not be enough as different companies may use the same type of ID assignment. For example employee ID of company X is 1234 and employee ID of company Y is 1234. The system is now broken if your just using employee ID in this case.

In these cases, more design is required to come up with an immutable ID. In this case, perhaps the ID is a combination of employee ID and company ID, or maybe employee ID, company ID, and conmpany name, whatever combo it is, it must be immutable.

This will require some forethought to ensure that external IDs can be found and matched to your internal IDs.

Now if the company is sending newer IDs to replace older ones, they have to send the previous one, or else how are you going to find the previous and match and update data on your end. You have to treat any ID not found as a new one.

You could use techniques as @MainMa suggests to do property/fuzzy logic type matching, but these are always flawed and tend to become very complex quickly.

Consider this case:

  • 123 South Main Street Apt 1
  • 123 S. Main St. Plaza Building Apt 1

Are these addresses the same? Maybe. The water is deep and murky in this pool.

Use immutable external IDs.

Jon Raynor
  • 10,905
  • 29
  • 47