4

ISO 8859-1 contains a few letter-free diacritics: The diaeresis (¨), the acute accent (´), the cedilla (¸) and the macron (¯).¹

Why were they included? As far as I know (please correct me if I am wrong), the ISO 8859 encodings do not support combining diacritical marks like Unicode, so you cannot even use them to create fancy new letters like Ÿ, ś, ŗ and ī; you can just use them stand-alone like this: a¨b. What's the point of that? Surely, the designers of ISO 8859-1 were very smart people and had very good reasons. What were they?


¹ The backtick/grave accent ` and the circumflex ^ should probably be in this list as well, but the reason for them being included in the ISO 8859 encodings seems fairly obvious to me: backwards compatibility to 7-bit ASCII.

Heinzi
  • 9,646
  • 3
  • 46
  • 59
  • Why do you care? Use [UTF8 everywhere](http://utf8everywhere.org/) today! ISO8859-1 is so previous century! – Basile Starynkevitch Sep 02 '15 at 19:36
  • @BasileStarynkevitch: Curiosity -- I like to learn from history. Of course I use UTF-8 everywhere today! – Heinzi Sep 02 '15 at 19:39
  • 4
    You ironically answered your own question, at least in part. Sometimes people want to write about the diacritics themselves. – Karl Bielefeldt Sep 02 '15 at 20:07
  • @KarlBielefeldt: Good point. :-) – Heinzi Sep 02 '15 at 20:20
  • @KarlBielefeldt: Although in this case it turned out, they didn't. The diacritics were removed in ISO8859-15 to make room for the €-sign and some accented characters needed for French, Estonian and Finnish, because they were practically unused. And apparently, nobody missed them. I certainly didn't. – Jörg W Mittag Sep 02 '15 at 23:32

2 Answers2

7

Note: when some important missing characters (such as the Euro symbol ) were added to the character set to create ISO8859-15, some mostly unused characters had to go, and this included the letter-free diacritics. So, the designers of ISO8859-1 may have been very smart people and may have had good reasons, but apparently nobody understood them!

However, your characterization that you can't create combined characters is not exactly true: if you have a terminal and/or printer that supports control characters, you can print YBACKSPACE¨ to get Ÿ. (That's of course different to how combined characters work in Unicode.)

Different to what backspace does today, the original meaning is to move the cursor back one space, and everything that gets printed then is printed on top of what was there before. That's how you would get boldface, strikethrough, or underlined text, for example:

  • HEYBACKSPACEBACKSPACEBACKSPACEHEY = HEY
  • HEYBACKSPACEBACKSPACEBACKSPACE--- = HEY
Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • Good point about the printers, but are you sure that there were *terminals* which supported showing two characters on top of each other (rather than replacing the first one with the second one)? – Heinzi Sep 03 '15 at 06:41
  • I believe so. This was specifically retrofitted from how teleprinters/teletypes worked, when CRT terminals were introduced. It still works in Linux today, for example, either GNU Texinfo or manpages still work this way, if I remember correctly. – Jörg W Mittag Sep 03 '15 at 09:13
  • I have seen CRT terminals that do (this may be due to font hackery internal to the terminal, but the effect was quite startling). – Vatine Sep 03 '15 at 12:46
  • If it's a simple bitmap terminal, all it takes is to replace `WRITE` with `OR` in the character drawing subroutine, I guess. That would at least take care of the compositing. For the boldface stuff, you need something more sophisticated. – Jörg W Mittag Sep 03 '15 at 12:50
  • @JörgWMittag: The man utility deliberately prints the underline before a character to be underlined, so as to accommodate the fact that on most CRT-based terminals, backspacing and printing a new character will obliterate the old one [on some terminals the act of backspacing alone will do that]. Some TTY interfaces respond to the backspace key with backspace-space-backspace to indicate that the old character is being eliminated. – supercat Sep 14 '15 at 22:40
  • 1
    @Heinzi: I've worked with an ASR-33 terminal where a backspace followed by another character would cause both to be printed on top of each other. What was interesting is that if one was punching a tape and made a mistake, the way to correct it was to hit the "back up" button on the punch and then push the "rub out" button. The "rub-out" key generates all-bits-one (DEL) which is ignored just like all-bits-zero (NUL). The purpose of the "delete" character code wasn't actually to make the recipient delete anything, but rather to be a code which, if overpunched on anything, would obliterate it. – supercat Sep 14 '15 at 22:46
  • @Heinzi: Also, while it's been ages since I've used one, some early CRT terminals used storage tubes which weren't scanned continuously but instead had a regeneration grid which would cause any information drawn to them to remain visible for many minutes (there was a touchy adjustment which, if set too low, would cause the screen to fade more quickly, and if set too high would cause the whole screen to start glowing). There wasn't any way to partially-erase the screen--the only operation was to erase everything (which would make the screen flash brightly). – supercat Sep 14 '15 at 22:49
4

ISO based Latin-1 on ECMA-094, which based it on the DEC Multinational Character Set so Europeans could use the DEC VT220. The first 128 code points of every 8-bit character set had to be the same as ASCII for backward-compatibility. Indeed, back in the bad old days, misconfigured network hardware often interpreted the high bit as an error-correction code and turned extended characters into 7-bit ASCII, so character sets had to be able to fall back to ASCII if this happened. This is why Russians adopted KOI8-R, which produced readable fallback transliterations, over the ISO standard for Cyrillic.

ASCII had them because the keys existed on teletype terminals. The keys existed on teletypes because, as Jörg mentioned, people would write à on an old-fashioned manual typewriter by typing a backspace `. (I typed it on my Linux box just now as: a right-alt `.) IBM based the keyboard of its PC on its typewriters, so it had those keys too, and since they exist, but have no meaning in any natural language, people started using them for markup. Here, for example, they denote code fragments.

Davislor
  • 1,513
  • 10
  • 13