Validating string length separately vs. handling database error for max length

Question

If my service is inserting into a varchar column with a certain max length, I can either

validate its length prior to inserting to avoid getting an error from the database if the string is too long
try inserting it as is and then handle the error if the database say it's too long, and giving the consumer a validation error in that case

The first seems to have the advantage of not causing to be returned by the database, but I would have to define the max length both in my application code and database schema (so that it could become inconsistent due to programmer oversight).

Please see this answer: [clean architecture validation in domain vs. data persistence layer](https://softwareengineering.stackexchange.com/a/351662/115084). It's okay, and often very necessary, for validation to be implemented in more than one layer. — John Wu, Sep 23 '21 at 13:59

score 2 · Accepted Answer · answered Sep 23 '21 at 16:22

Your code should be always prepared for getting an error from the database during an insert operation, and to display a useful error message in this case. Hence "either - or" is the wrong question: you should ask if it makes sense to validate the string length prior to inserting additionally.

However, when you insert a lot of columns in one transaction, checking the string length beforehand could make it simpler to find out which of the many columns might violate a length constraint. If that's what you are trying to accomplish, I would recommend to make sure your application retrives the string length either directly from the database schema (for example, once when the program starts, querying the related schema tables), or the schema as well as the application code try to use a "single source of truth" for the maximum string length.

Which of these alternatives you pick depends heavily on how much effort it is in your system to retrieve schema information, how the schema is created or generated, how the database layer is created or generated, and how this looks in relationship to the alternative of getting the affected column name and allowed string length out of a "constraint-violated message" from the DB.

Another thing to consider is how your database schema is maintained:

Does your team have a development process which makes sure your DB schema stays backwards compatible? And do you have a quick update process for your application in case a user notices an error?

If the answers are yes and yes, you may simply live with the risk of getting an inconsistency here.

Multiple stages of testing/filtering is especially important when there may be a mismatch between the various stages of dealing with the data. For instance, a web UI may present strings of characters in UTF-8, your programming language may store handle character strings as UCS-16, and your database may store character strings as UCS-32. In each case, the limit may be different, and not in programmer-friendly way. — BobDalgleish, Sep 24 '21 at 15:52

score 1 · Answer 2 · answered Sep 24 '21 at 14:05

You should do both (and, if there's a web-based User Interface involved, you should be limiting the length of the input field there as well!)

Yes, it means you have the length of this field defined in multiple places but that's no different to having field names in SQL in the Application code. If those become "inconsistent due to programmer oversight" then the code breaks then as well.
That's what Testing is for.

The database can throw all sorts of things back at you (depending on how much constraint you place on the arriving data) and you should be prepared for this but there's nothing wrong with a bit of Defensive Programming to reduce the number of times it has to get all "upset" with you.

Also, the database will probably baulk at the first errant field - your code can be more "intelligent" and check every field at once (within reason) and present a single response containing multiple errors.

score 0 · Answer 3 · answered Sep 24 '21 at 12:12

This is not an answer to the literal question you posed, but based on the (supposedly) underlying requirement.

You typically declare a varchar length because you know that longer strings are invalid for that field, because of some business logic / domain analysis.

In my experience, all cases with a limited string length also had some syntax requirements as well. So, rejecting only strings that exceed some length limit still allows for invalid data. What you really need is a powerful syntax checker.

The preferred language for expressing syntax requirements is Regular Expressions. And as databases typically don't support regex-based constraints, this validation has to be done in the application anyway. Then, the length limitation comes for free, as part of your regex. You have to decide yourself then whether the length constraint still makes sense in the database (e.g. see the next paragraph).

One more argument for not limiting length in database: Validity constraints can change over time (e.g. nowadays we have internet top-level domains with more than 4 characters), and installing a new application software version is much easier and faster than migrating a big database table from varchar(4) to varchar(32).

Validating string length separately vs. handling database error for max length

3 Answers3