I have an array of records for bids (sample below in JSON). I would like to store this data in a relational database (Postegresql), however, the supplier data is not given with IDs, and there will be some entries that need to be deduplicated. For instance, in the example below the "John Smith and Associates" is listed under several similar names as a supplier.
Should I give each supplier a unique ID (even those with matching names) and then deduplicate after the db is populated, or do this while adding entries to the database?
[
{
"Solicitation No": "B2342",
"Issuing Organization": "VT Timber Sales",
"Award Date": "2017/06/29",
"Supplier_details": [{
"Successful Supplier(s)": "John Smith & Associates",
"Supplier City": "Georgetown",
"Award Total": "$22034.13"
}]
},
{
"Solicitation No": "B2344",
"Issuing Organization": "VT Timber Sales",
"Award Date": "2017/06/30",
"Supplier_details": [{
"Successful Supplier(s)": "John Smith & Assoc",
"Supplier City": "Georgetown",
"Award Total": "$5034.13"
},
{
"Successful Supplier(s)": "Some Logging ltd.",
"Supplier City": "Georgetown",
"Award Total": "$1034.13"
}]
}, (...)