40

I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N.

I tried the following query on a table which has nvarchar() rows, and it works fine, so the prefix N is not required:

insert into [TableName] values ('Hello', 'World')

Why the strings are prefixed with N in every example I've seen?

What are the pros or cons of using this prefix?

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
qinking126
  • 541
  • 1
  • 5
  • 6
  • Isn't the N only needed for literal strings? – Wayne In Yak Jul 06 '12 at 14:56
  • Polish is a non-Latin based language???? – Heckflosse_230 Jul 06 '12 at 16:58
  • 2
    `N` means National, as in "National Varying Character", see [Equivalent ANSI SQL Data Types](http://office.microsoft.com/en-us/access-help/equivalent-ansi-sql-data-types-HP001032229.aspx). – ErikE Jun 17 '14 at 20:16
  • I agree with this question and nobody has answered it so far, AFAICT. Maybe it could be restated as “why is it bad to let SQL implicitly convert my `VARCHAR` to `NVARCHAR` when my string literal is ASCII?”. – binki Oct 14 '14 at 14:11
  • This question was already asked and answered here: [What is the difference between varchar and nvarchar?](http://stackoverflow.com/q/144283/439793) –  Oct 14 '14 at 17:29

3 Answers3

29

NVarchar is used for Unicode. If your database is not storing multilingual data you can keep using Varchar. As an example: N'abc' simply converts your string to unicode.

Lightness Races in Orbit
  • 8,755
  • 3
  • 41
  • 45
Pieter B
  • 12,867
  • 1
  • 40
  • 65
25

By default SQL server uses the Windows-1252 character codes for varchar. It contains most of characters for latin-based languages (English, German, French, etc.) But it does not contain characters for non-latin based languages (Polish, Russian, etc.). As stated by @Pieter B, nvarchar is used to get around that issue because it is for Unicode which contains those missing characters. This comes at a cost, it takes twice as much space to store nvarchar than varchar.

Putting N in front of your string ensures the characters are converted to Unicode before being placed into a nvarchar column. Most of the time you will be okay leaving the N off, but I wouldn't recommend it. It is a lot better to be safe than sorry.

bwalk2895
  • 1,988
  • 1
  • 13
  • 15
  • 4
    Just a clarification: "By default" SQL server uses the encoding corresponding to the Varchar field's collation, which is overridable at the time of creation of the field, generally based on the default collation for your instance. The default collation for your instance can be set at install time, but generally corresponds to the system default locale's CP_ACP. That will be Windows 1252 on a US-English machine, but 932 on a machine with a Japanese system locale, 1251 on a Russian machine, etc. The moral of the story? Use NVarchar :) – JasonTrue Jul 14 '12 at 04:12
  • 1
    So far this is the only answer that addresses the question as asked "Why use the N prefix on literal strings since SQL will implicitly transcode?". The other answers are all for a different question "What's the difference between nvarchar vs. varchar?" – Tim Sparkles Jul 31 '18 at 21:30
22

Edit: Microsoft finally introduced UTF-8 support in SQL Server 2019 (15.x). So this answer, originally posted in 2012, is outdated.


Because MS SQL Server has poor support for UTF-8 compared to other RDBMS.

MS SQL Server follows the convention, used within Windows itself, that "narrow" strings (char in C++, CHAR or VARCHAR in SQL) are encoded in a legacy "code page". The problem with code pages is that they have a limited number of characters (most are single-byte encodings, which limits the reportoire to 256 characters) and are designed around a single language (or group of languages with similar alphabets). This makes it hard to store multilingual data. For example, you can't store both Russian and Hebrew data because Russian uses code page 1251 and Hebrew uses code page 1255.

Unicode solves this problem by using a single giant coded character set with room for more than a million characters, enough to represent every language in the world. There are several Unicode encoding schemes; Microsoft prefers to use UTF-16, for historical reasons. Because UTF-16 represents strings as a sequence of 16-bit code units instead of the traditional 8-bit, a separate character type is needed. In MSVC++, this is wchar_t. And in MS SQL, it's NCHAR or NVARCHAR. The N stands for "national", which seems backwards to me because Unicode is about inter-nationalization, but that's the ISO terminology.

Other SQL implementations let you store UTF-8 text in a VARCHAR column. UTF-8 is a variable-length (1-4 bytes per character) encoding that's optimized for the case when your data is mostly in the Basic Latin range (which are represented as the same 1 byte per character as ASCII), but can represent any Unicode character. Thus, you'd avoid the "twice as much space" problem mentioned by bwalk2895.

Unfortunately, MS SQL Server does not support UTF-8 VARCHAR, so instead you have to either use UTF-16 instead (and waste space for ASCII text), use a non-Unicode code page (and lose the ability to represent foreign characters), or store UTF-8 in a BINARY column (and deal with inconveniences like SQL string functions not working properly, or having to view the data as a hex dump in your GUI DB manager).

dan04
  • 3,748
  • 1
  • 24
  • 26
  • 2
    In versions earlier then SQL Server 2012, they ware using UCS-2 encoding, which is strictly 2byte. In newer versions, they are using UTF-16 which is variable length mapping to 4bytes per character (similar as UTF-8 but starting at 2 bytes). – j123b567 Jan 19 '15 at 13:05