Should we delegate user input sanitization/validation?

Question

Consider this code:

new FrameworkClass( [ 'query' => $_POST[ 'input' ] ] );

FrameworkClass is supposed to do input sanitization and validation. Should we just trust 3rd party code to do it's job?

My argument would be that I would rather have my input data sanitized/validated twice than trusting an external library.

If we would simply trust FrameworkClass, then we could also omit escaping data before output if it comes from a "trusted source".

This depends 100% on the kind of system, the 3rd party code, what it is supposed to do, what you know about it how well it does its job, how mature it is etc. If you don't trust 3rd party code at all, where so you stop? Are you going to reimplement all your standard libs? Your compiler? Your operating system just because you trust your own code more than 3rd party code? And what makes you believe a specialized sanitizing library cannot do its job better than anything you can hack together in a few hours? — Doc Brown, Sep 26 '18 at 13:45

score 3 · Answer 1 · answered Sep 26 '18 at 14:52

Whether data is “safe” or not depends on how that data is used. Therefore, no framework can exist that magically sanitizes your data. Instead:

determine how the data will be used.
for each use, determine whether the data has to be processed for safety
implement those safety measures

Example, where the data is a plain text chat message:

Usage: the message is displayed as part of a HTML page.
Safety requirement: HTML special characters have to be escaped to avoid XSS.
Solution: use the existing escaping function of your template engine.
Usage: the message is stored in a SQL database.
Safety requirement: the message contents must not be interpolated verbatim into a SQL statement.
Solution: execute a prepared statement with the message contents as a parameter.
Usage: the message is displayed in a public chat system.
Requirement: the message should not contain offensive language.
Solution: reject messages when they include obvious swearwords.

In this example, the same data is used in different contexts that cannot be fixed by somehow sanitizing the input first – they have completely different requirements.

Whenever such sanitation is attempted, it is usually either ineffective (e.g. by adding backslashes before some characters), or even destructive (e.g. by stripping out “dangerous” characters). I once used a messaging system that stripped out special characters such as umlauts or most punctuation, rendering messages incomprehensible.

But when implementing proper security measures (which may include steps such as stripping out content or escaping content) then using existing implementations is a good idea. The framework is likely better tested than the code that you would write. It's also going to save you time. For example, “offensive language” is really really hard to determine. It may be desirable to use some existing library with various heuristics, or at least to start with a publicly available list of swearwords.

In any case the key is to not throw an existing solution at your problem to see if it “works”, but to start with your actual problem and then work out a solution – where that solution might already be part of your framework.

Should we delegate user input sanitization/validation?

1 Answers1