4

In previous web applications I've built, I've had issues with users entering exotic characters into forms which get stored strangely in the database, and sometimes appear different or double-encoded when retrieved from the database and displayed back in the browser. I'm starting a new project now, and I want to prevent these issues right from the start.

What I'm looking for is a checklist of things I can do to prevent character encoding issues such as these, no matter what users enter into forms. If I set my database tables to UTF-8, and set all of my web pages to assume content is UTF-8, is this enough? Will some characters still appear differently than the user entered them? Should I do some validation on the client side that doesn't let users enter in certain characters?

CFL_Jeff
  • 3,517
  • 23
  • 33

2 Answers2

4

If I set my database tables to UTF-8, and set all of my web pages to assume content is UTF-8, is this enough?

You need to ensure that the connection between the web application and the database doesn't mangle the encoding (I believe you need to explicitly set this on the connection string for MySQL, for instance).

Basically you need to ensure that every step in the chain is using the same encoding.

Oded
  • 53,326
  • 19
  • 166
  • 181
2

You need to make sure you are specifying the correct data type for columns as well, nchar/nvarchar/ntext are required to hold unicode values, char/varchar/text will only hold ASCII values. Also depending on how you transfer data you may need to escape certain characters to avoid unexpected results, lists of these characters can be found through Google searches easily enough.

Ryathal
  • 13,317
  • 1
  • 33
  • 48
  • Please note that there SQL implementations other than MS SQL Server, most of which allow storing UTF-8 in a varchar column. – dan04 Apr 18 '12 at 21:07