I have a large database of customers. There's a need to identify customers who have two or more customer IDs.
This comes mostly from people at the front-desk creating new customers instead of searching for their existing IDs or doing a bad data capture, so we end up having this sort of thing:
ID name surname birthday
------------------------------------------------
14 Juan Vazquez 1955-01-20
78 Juan Vasquez 1980-06-12
85 Juan Vazquez 1980-06-12
IDs 78 and 85 almost surely belong to the same person (notice same birthday but a slight difference in surname)
ID 14 definitely is another person due to a big difference in birthdays related to the similar records
Therefore we can assign equivalencies:
MainID EquivalentID
------------------------
78 85
What solution can you recommend to identify similar records based on rules?