Questions tagged [text-encoding]
25 questions
42
votes
8 answers
Why are there multiple Unicode encodings?
I thought Unicode was designed to get around the whole issue of having lots of different encoding due to a small address space (8 bits) in most of the prior attempts (ASCII, etc.).
Why then are there so many Unicode encodings? Even multiple versions…

Matthew Scharley
- 1,627
- 13
- 17
20
votes
4 answers
What type of encoding can I use to make a string shorter?
I am interested in encoding a string I have and I am curious if there is a type of encoding that can be used that will only include alpha and numeric characters and would preferably shorten the number of characters needed to represent the string.
So…

Abe Miessler
- 635
- 2
- 5
- 13
19
votes
4 answers
Why does UTF-8 waste several bits in its encoding
According to the Wikipedia article, UTF-8 has this format:
First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4
point point Used
U+0000 U+007F 1 0xxxxxxx
U+0080 U+07FF 2 110xxxxx 10xxxxxx
U+0800 U+FFFF…

qbt937
- 301
- 2
- 6
9
votes
1 answer
What is the encoding used in Git's binary patches?
Git can generate patches/diffs for binary files as well as for text files.
I'm trying to figure out what encoding it uses for its binary patches.
Here is an example:
diff --git a/www/images/openconnect.png b/www/images/openconnect.png
new file mode…

Dan Lenski
- 427
- 2
- 9
8
votes
2 answers
Does FFMpeg support gpu acceleration of media encoding/decoding?
I was wondering if ffmpeg supported gpu acceleration. I was reading on their websites and came across contradicting information.
http://www.ffmpeg.org/general.html#Video-Codecs
-H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)…

Jason123
- 133
- 2
- 2
- 6
8
votes
2 answers
How relevant is UTF-7 when it comes to parsing emails?
I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way.
One of those things is emails encoded as UTF-7. Most emails come…

Pablo Fernandez
- 313
- 1
- 9
7
votes
2 answers
How is encoding handled correctly during copy-paste between programs?
Suppose
a program A opens a text file A using encoding A to decode the file, and
a program B opens a text file B using encoding B.
When we copy some text from file B in program B to file A in program A using mouse selection, ctrl+c and then…

Tim
- 5,405
- 7
- 48
- 84
5
votes
1 answer
UTF-8 questions
When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 byte (or more precisely 7 bits).
If the codepoint…

codepersonnel49
- 69
- 3
4
votes
2 answers
Why does ISO 8859-1 contain letter-free diacritics?
ISO 8859-1 contains a few letter-free diacritics: The diaeresis (¨), the acute accent (´), the cedilla (¸) and the macron (¯).¹
Why were they included? As far as I know (please correct me if I am wrong), the ISO 8859 encodings do not support…

Heinzi
- 9,646
- 3
- 46
- 59
4
votes
4 answers
What encoding is used by javax.xml.transform.Transformer?
Please can you answer a couple of questions based on the code below (excludes the try/catch blocks), which transforms input XML and XSL files into an output XSL-FO file:
File xslFile = new File("inXslFile.xsl");
File xmlFile = new…

Helen Reeves
- 141
- 1
- 1
- 4
4
votes
3 answers
URL Encryption vs. Encoding
At the moment non/semi sensitive information is sent from one page to another via GET on our web application. Such as user ID or page number requested etc. Sometimes slightly more sensitive information is passed such as account type, user privileges…

hozza
- 293
- 1
- 3
- 12
3
votes
4 answers
Windows compatibility with Unix/Linux newline "\n"
A follow-up to Difference between '\n' and '\r\n'.
It's been few decades since the schism was introduced. Nowadays, when documents are being exchanged over the internet, typically with no prior knowledge of the client's preference of line endings,…

Ondra Žižka
- 267
- 3
- 6
3
votes
2 answers
Unable to debug an encodded javascript?
I’m having some problems debugging an encoded javacscript. This script I’m referring to given in this link over here.
The encoding here is simple and it works by shifting the unicodes values to whatever Codekey was use during encoding. The code…

miles away
- 31
- 2
2
votes
2 answers
Compressing EBCDIC file vs UTF8
Today I went across a weird case for which I have no explanation, so here I am.
I have two files with identical content, but one is encoded in UTF-8 and the other one is in IBM EBCDIC. Both of them have approximately the same size (336MB and…

rodripf
- 137
- 2
2
votes
4 answers
Reduce number of digits by converting to alphanumeric data
We have an app that receives a web service request, processes it and sends it back to our client by another web service call. There is a unique field in the request, a tracking Id, which currently follow the pattern…

Suraj Muraleedharan
- 143
- 8