The ampersand (&
) character isn't really forbidden, but because it is the separator for query string parameters, using it unescaped in a parameter name or value will cause undesired behavior - for example, if you have a parameter foo=bar&baz
and another quux=1
, a naive attempt at baking a URL might result in http://www.example.com/home?foo=bar&baz&quux=1
; the query string parameters will be parsed as (excuse the pseudo-JSON) { "foo": "bar", "baz", "quux": 1 }
. A similar effect can be caused with other special characters (=
, @
, +
, %
), if concatenated into a query string unescaped.
Mis-parsing (or rather, wrongly constructing) query string parameters like this is often just a nuisance (values not ending up where you expect them), but often, they can be abused to trick a web application into doing something it rather shouldn't - which would be a security vulnerability.
All web programming platforms provide a way of escaping query string parameters, and many have other ways of constructing query strings for you that take care of escaping. The other way around, BTW, is taken care of automatically: pretty much every web stack decodes query string parameters before handing them to your code, so when you read them back, they are already back in their original form.
The <b>
thing is a different story - by itself, putting HTML (or things that look like HTML) into a query string parameter is not a problem at all, and there may even be legitimate uses for this. However, query string parameters are often inserted into the response HTML, and when that happens, you do have a problem. An example I've seen in the wild used a query string parameter to pass an error message, which was then inserted into a nice red error box; the error message was inserted without further checking or processing, so an attacker could insert <script>
tags and run malicious javascript, such as reading the current session cookie and posting it back to the attacker's dropbox, which would then allow the attacker to take over the victim's session. This attack vector is commonly referred to as cross-site scripting (XSS), and it's both a common and a serious vulnerability.
However, culling potentially dangerous code from query string parameters is a rather weak and costly defense. The proper way of preventing XSS is to escape ('html-encode'; again, every web programming stack that I know of provides a function to do this) untrusted data when outputting HTML, not when reading it. So in my example above, the correct fix would have been to html-encode the error message before writing it into the HTML page.