3

A user uploads a CSV to the server to be inserted into the database and I would like to return a meaningful response to this user.

My thoughts :

{
   inserted : x records
   discarded : y records
}

I believe also that I should specify what records were discarded because of the inconsistent format / missing values. Problem is that it could get messy let's say for 100k records (a possible scenario in my case). What are your thoughts?

Or should I have a radical policy ? Either all records can be inserted or none ?

Oleg
  • 181
  • 5
  • 1
    You could return the # of the row with wrong data, instead of the whole row. But returning the "delta" with all the wrong rows is valuable. The user will have the rows that left to be stored. He just need to fix the errors. – Laiv Nov 23 '17 at 22:21
  • @Laiv what do you mean by "delta" :) ? – Oleg Nov 23 '17 at 22:23
  • The subset of rows that could not be stored – Laiv Nov 23 '17 at 22:24

1 Answers1

7

This depends on the way the CSV is created, maintained or fixed in case of errors, and on how the imported data will be processed after.

Let's start with the question of "all or nothing" vs. "reject only bad ones": to discard some rows and accept only a part of them makes only sense

  • if that imported subset can be processed without the discarded rows in a meaningful way

  • if the user can fix the rejected rows easily and import them afterwards separately again

If however there is no meaningful processing possible without the full set of rows, you should reject it completely. Same is a good idea if you know for sure the users always create the CSV "as a whole" (maybe using some automatic tool), and in case of errors, they can easily recreate the complete fixed CSV again.

To the question how the failure information should look like: a specification of the rejected records (and the reason why they were rejected) makes sense if the user can actually process this error list in a sensible manner. If that's the case, but the error log can become very long, consider to return it in form of a log file. If you know for sure the users need only some examples of what went wrong, to fix the CSV creation process (and not to fix the individual CSV rows one-by-one), then it makes sense to restrict the failure information to, for example, something like the first 10 or 50 error messages.

TLDR; there is no "one size fits all", you need to understand the use cases in the chain before and after the import, only then you can make a sensible decision.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565