Why special characters are deemed risky in URL and query strings?

Question

From a security perspective, the special characters like '&' or <b> are a big no no in URLs and query strings. I could find the articles that explained the ways to bypass this restriction, but could find something that explained with example how can this be a security risk.

Please explain the risks of these special characters. The question is simpler and I could google a bit more to find the answer. But the reason I am asking it here is that it will help someone in the future. Besides the level of detailing and specification one gets here, cannot be learned anywhere else.

Where are they not allowed? Can you post your source? I have certainly used `&` on URLs before and have passed in `` as a parameter. — Oded, Jan 07 '12 at 19:44
That's a _very_ different thing than "not allowed on URL". You mean that ASP.NET does not pass these constructs through. That's what you should ask. — Oded, Jan 07 '12 at 20:10

score 4 · Accepted Answer · answered Jan 07 '12 at 20:11

In ASP.NET, those characters are blocked by default to prevent a beginner developer introducing a security hole in the application.

Consider the following scenario:

On a home-made blog, users can post comments.
Those comments are stored in the database.
The comments are loaded later and displayed to visitors.

When submitting something containing HTML reserved characters, there are chances it will be saved as is in the database (and in fact, it must; escaping HTML characters on this level is rather a bad practice).

Later, the same comments are displayed on a page, as is. The developer forgot to escape HTML characters.

Now anybody can post arbitrary HTML code on the page, including malicious JavaScript.

Note that if this is the default behavior in ASP.NET, you can still disable it, either if the users are expected to submit HTML content, or you don't care either because you'll never display the submitted content, or because you:

are sure the content is escaped properly each time it is displayed,
have made the proper tests to ensure that the content is really escaped,
have audit and security checks of your code.

The question's title mentions `URL` and `query strings`, but I didn't see any mention in your answer of the risks associated with not encoding these values. Instead, your answer seems to focus on HTML and ASP.NET. Was the question changed or something? — Sam, Dec 05 '16 at 23:18
@Sam: I suppose that there were details in the comments, now removed, which weren't added to the question. For instance, a comment from Oded answering a now removed comment mentions ASP.NET, while the question itself, tags included, makes no mention of ASP.NET whatsoever. Given no downvotes and the fact that my answer was accepted, I suppose that it actually answered the question in 2012, even if in its current form, it makes no much sense. If you have a similar question specifically about URIs, post one. — Arseni Mourzenko, Dec 06 '16 at 00:01

score 3 · Answer 2 · answered Jan 07 '12 at 20:31

The ampersand (&) character isn't really forbidden, but because it is the separator for query string parameters, using it unescaped in a parameter name or value will cause undesired behavior - for example, if you have a parameter foo=bar&baz and another quux=1, a naive attempt at baking a URL might result in http://www.example.com/home?foo=bar&baz&quux=1; the query string parameters will be parsed as (excuse the pseudo-JSON) { "foo": "bar", "baz", "quux": 1 }. A similar effect can be caused with other special characters (=, @, +, %), if concatenated into a query string unescaped.

Mis-parsing (or rather, wrongly constructing) query string parameters like this is often just a nuisance (values not ending up where you expect them), but often, they can be abused to trick a web application into doing something it rather shouldn't - which would be a security vulnerability.

All web programming platforms provide a way of escaping query string parameters, and many have other ways of constructing query strings for you that take care of escaping. The other way around, BTW, is taken care of automatically: pretty much every web stack decodes query string parameters before handing them to your code, so when you read them back, they are already back in their original form.

The <b> thing is a different story - by itself, putting HTML (or things that look like HTML) into a query string parameter is not a problem at all, and there may even be legitimate uses for this. However, query string parameters are often inserted into the response HTML, and when that happens, you do have a problem. An example I've seen in the wild used a query string parameter to pass an error message, which was then inserted into a nice red error box; the error message was inserted without further checking or processing, so an attacker could insert <script> tags and run malicious javascript, such as reading the current session cookie and posting it back to the attacker's dropbox, which would then allow the attacker to take over the victim's session. This attack vector is commonly referred to as cross-site scripting (XSS), and it's both a common and a serious vulnerability.

However, culling potentially dangerous code from query string parameters is a rather weak and costly defense. The proper way of preventing XSS is to escape ('html-encode'; again, every web programming stack that I know of provides a function to do this) untrusted data when outputting HTML, not when reading it. So in my example above, the correct fix would have been to html-encode the error message before writing it into the HTML page.

Why special characters are deemed risky in URL and query strings?

2 Answers2