Edit: Microsoft finally introduced UTF-8 support in SQL Server 2019 (15.x). So this answer, originally posted in 2012, is outdated.
Because MS SQL Server has poor support for UTF-8 compared to other RDBMS.
MS SQL Server follows the convention, used within Windows itself, that "narrow" strings (char
in C++, CHAR
or VARCHAR
in SQL) are encoded in a legacy "code page". The problem with code pages is that they have a limited number of characters (most are single-byte encodings, which limits the reportoire to 256 characters) and are designed around a single language (or group of languages with similar alphabets). This makes it hard to store multilingual data. For example, you can't store both Russian and Hebrew data because Russian uses code page 1251 and Hebrew uses code page 1255.
Unicode solves this problem by using a single giant coded character set with room for more than a million characters, enough to represent every language in the world. There are several Unicode encoding schemes; Microsoft prefers to use UTF-16, for historical reasons. Because UTF-16 represents strings as a sequence of 16-bit code units instead of the traditional 8-bit, a separate character type is needed. In MSVC++, this is wchar_t
. And in MS SQL, it's NCHAR
or NVARCHAR
. The N
stands for "national", which seems backwards to me because Unicode is about inter-nationalization, but that's the ISO terminology.
Other SQL implementations let you store UTF-8 text in a VARCHAR
column. UTF-8 is a variable-length (1-4 bytes per character) encoding that's optimized for the case when your data is mostly in the Basic Latin range (which are represented as the same 1 byte per character as ASCII), but can represent any Unicode character. Thus, you'd avoid the "twice as much space" problem mentioned by bwalk2895.
Unfortunately, MS SQL Server does not support UTF-8 VARCHAR
, so instead you have to either use UTF-16 instead (and waste space for ASCII text), use a non-Unicode code page (and lose the ability to represent foreign characters), or store UTF-8 in a BINARY
column (and deal with inconveniences like SQL string functions not working properly, or having to view the data as a hex dump in your GUI DB manager).