-1

I'm guessing the answer is no because there's quite a few examples, including right here. If I let users submit their own HTML text and then render it side by side, what could the user do that would break the rest of the page and how can I guard against it?

I imagine escaping the input as soon as they enter it would be a good idea, then un-escaping as I'm rendering, as well as stripping out <script> tags, but other than that what precautions should I be taking?

leylandski
  • 407
  • 1
  • 3
  • 14
  • 1
    What was already said, `script` and `iframe` would be dangerous. Beyond that a malicious user could use any amount of invalid html, opening tags and not closing them, inserting CSS that changes the display of your sites html, linking to malicious sites (and to some degree hiding the link). Just what comes to mind immediately. Putting it inside an iframe as @Karan says could possibly avoid some such problems. – thorsten müller Nov 20 '15 at 10:25
  • See http://lcamtuf.coredump.cx/postxss/ for lots of very specific examples of how malicious users could try to break your site with HTML. – Ixrec Nov 21 '15 at 16:22

2 Answers2

4

You want to use a whitelisting approach, where you only accept known good html syntax, instead of stripping out known malicious content. There is no reliable way to strip out malicious content except by rejecting everything that you don't know for sure is safe. Do not use regexes, do not try to roll your own code, use a good html sanitization library. Libraries that do this typically parse the HTML into a DOM, remove all non-whitelisted tags and attributes, and then serialize the DOM to HTML again.

Examples of html sanitization libraries that use a whitelisting policy are the OWASP Java HTML Sanitizer and PHP HTMLPurifier.

Joeri Sebrechts
  • 12,922
  • 3
  • 29
  • 39
1

You should consider sanitizing the input html data by the user before rendering on the webpage.

  • Strip out the script tags or HTML encode them, so that the user is aware that he has entered the undesirable tags.

  • Strip|HtmlEncode out the iframe tags also. They might get mischievous, if the end user is trying to malign your webpage.

  • Strip|HtmlEncode out the input tags

  • Probably you can use regex on the server side to weed out these special tags also. OR use the HTML Agility Pack to navigate within the html elements & cleanse them from the server side.

  • Render the html in the new page rather than the existing page in your application. At least this will protect your existing web pages. OR render this custom html in the iframe within your existing webpage. This would help you contain any unknown implications within the confined area.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
Karan
  • 366
  • 1
  • 9