Questions tagged [character-encoding]
57 questions
580
votes
1 answer
Is the use of "utf8=✓" preferable to "utf8=true"?
I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.
So, is this a better way to resolve potential…

Gary
- 24,420
- 9
- 63
- 108
163
votes
5 answers
How to detect the encoding of a file?
On my filesystem (Windows 7) I have some text files (These are SQL script files, if that matters).
When opened with Notepad++, in the "Encoding" menu some of them are reported to have an encoding of "UCS-2 Little Endian" and some of "UTF-8 without…

Marcel
- 3,092
- 3
- 18
- 19
119
votes
5 answers
What is the advantage of choosing ASCII encoding over UTF-8?
All characters in ASCII can be encoded using UTF-8 without an increase in storage (both requires a byte of storage).
UTF-8 has the added benefit of character support beyond "ASCII-characters". If that's the case, why will we ever choose ASCII…

Pacerier
- 4,973
- 7
- 39
- 58
77
votes
2 answers
Why do so many hashed and encrypted strings end in an equals sign?
I work in C# and MSSQL and as you'd expect I store my passwords salted and hashed.
When I look at the hash stored in an nvarchar column (for example the out the box aspnet membership provider). I've always been curious why the generated Salt and…

Liath
- 3,406
- 1
- 21
- 33
40
votes
3 answers
Why do we need to put N before strings in Microsoft SQL Server?
I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N.
I tried the following query on a table which…

qinking126
- 541
- 1
- 5
- 6
34
votes
8 answers
Should character encodings besides UTF-8 (and maybe UTF-16/UTF-32) be deprecated?
A pet peeve of mine is looking at so many software projects that have mountains of code for character set support. Don't get me wrong, I'm all for compatibility, and I'm happy that text editors let you open and save files in multiple character…

Joey Adams
- 5,535
- 3
- 30
- 34
28
votes
5 answers
What issues lead people to use Japanese-specific encodings rather than Unicode?
At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character…

Nicolas Raoul
- 1,062
- 1
- 11
- 20
27
votes
7 answers
Is the carriage-return char considered obsolete
I wrote an open source library that parses structured data but intentionally left out carriage-return detection because I don't see the point. It adds additional complexity and overhead for little/no benefit.
To my surprise, a user submitted a bug…

Evan Plaice
- 5,725
- 2
- 24
- 34
19
votes
4 answers
Why does UTF-8 waste several bits in its encoding
According to the Wikipedia article, UTF-8 has this format:
First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4
point point Used
U+0000 U+007F 1 0xxxxxxx
U+0080 U+07FF 2 110xxxxx 10xxxxxx
U+0800 U+FFFF…

qbt937
- 301
- 2
- 6
16
votes
2 answers
Is UTF-16 fixed-width or variable-width? Why doesn't UTF-8 have byte-order problem?
Is UTF-16 fixed-width or variable-width? I got different results
from different sources:
From http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF:
UTF-16 stores Unicode characters in sixteen-bit chunks.
From…

Tim
- 5,405
- 7
- 48
- 84
12
votes
3 answers
Should my source code be in UTF-8?
I feel that often you don't really choose what format your code is in. I mean most of my tools in the past have decided for me. Or I haven't really even thought about it. I was using TextPad on windows the other day and as I was saving a file, it…

Parris
- 241
- 2
- 8
8
votes
1 answer
Is the BOM optional for UTF-16 and UTF-32?
I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32.
But then I have read the following (in this article):
Let's look just at the ones that Notepad supports.
8-bit ANSI (of which 7-bit ASCII is a subset). These…

user9002947
- 249
- 3
- 4
8
votes
2 answers
How relevant is UTF-7 when it comes to parsing emails?
I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way.
One of those things is emails encoded as UTF-7. Most emails come…

Pablo Fernandez
- 313
- 1
- 9
8
votes
2 answers
Should I HTML encode all output from my API?
I am creating a RESTful JSON API to access data from our website where the content is in German.
A handful of the fields will return formatted HTML while most are single lines of text although they are highly like to include special characters.
To…

John
- 733
- 1
- 8
- 19
7
votes
2 answers
How is encoding handled correctly during copy-paste between programs?
Suppose
a program A opens a text file A using encoding A to decode the file, and
a program B opens a text file B using encoding B.
When we copy some text from file B in program B to file A in program A using mouse selection, ctrl+c and then…

Tim
- 5,405
- 7
- 48
- 84