22

I've read Where are octals useful? and it seems like octals are something that were once upon a time useful.

Many languages treat numbers preceding with a 0 as octal, so the literal 010 is actually 8. A few among these is JavaScript, Python (2.7), and Ruby.

But I don't really see why these languages need octal, especially when the more likely use of the notation is to denote a decimal number with a superfluous 0.

JavaScript is a client-side language, octal seems pretty useless. All three are pretty modern in other sense, and I don't think that there would be much code using octal notation that would be broken by removing this "feature".

So, my questions are:

  • Is there any point of these languages supporting octal literals?
  • If octal literals are necessary, why not use something like 0o10? Why copy an old notation that overrides a more useful use case?
Manishearth
  • 688
  • 6
  • 19
  • 24
    Will you accept "to confuse youngsters during an interview" as an answer? – yannis Jan 14 '14 at 14:02
  • 12
    conforming to C syntax + blind copying – ratchet freak Jan 14 '14 at 14:03
  • It's all about backwards compatibility. – Ingo Jan 14 '14 at 14:10
  • @Ingo With what? I can't think of any legacy JS program that would need octal literals. The same goes for ruby. And they could have designed it in a less confusing manner in the first place, both are not that old. Backward compatibility with a different language is rather stupid for things like this :/ – Manishearth Jan 14 '14 at 14:14
  • 1
    @Manishearth C inherited it from its ancestors. Java has it, because C has it. Most others have it, because C and Java have it. – Ingo Jan 14 '14 at 14:18
  • possible duplicate of [Reasoning behind the syntax of octal notation in Java?](http://programmers.stackexchange.com/questions/221797/reasoning-behind-the-syntax-of-octal-notation-in-java) – gnat Jan 14 '14 at 14:22
  • 2
    One still sees octal Unix file right changes: `chmod` with 0666 or 0777 for groups of 3 bits for user, group and others: read, write, executable. – Joop Eggen Jan 14 '14 at 15:17
  • Out of curiosity, in what situation (in javascript at least) would you find 010 as anything other than a string (unless you hardcoded it)? `parseFloat('010')`, `parseInt('010')`, and `+'010'` all return 10 in javascript. It's only when you use the `010` literally that it converts to octal. – Llepwryd Jan 14 '14 at 18:39
  • 3
    @Llepwryd In older browsers, `parseInt('010')` did indeed return 8, hence all the advice to always use `parseInt(foo, 10)` (and it's still a habit for me) – Izkata Jan 14 '14 at 19:10
  • @Llepwryd When typing numbers aligned, or converting human-entered numbers to JS or something. Also what Izkata said. – Manishearth Jan 14 '14 at 20:13
  • In JS strict mode, 010 is a syntax error. In sloppy mode it's still 8 though. – Domenic Jan 15 '14 at 20:38
  • Bitwise operations is one of the good reasons for it. BTW: in C `0` is an octal expression by itself. Check the standard. – Pryftan Apr 19 '23 at 16:03

3 Answers3

34

Blind copying of C, just like ratchet freak said in his comment

The vast majority of "language designers" these days have never seen anything but C and its copies (C++, Java, Javascript, PHP, and probably a few dozen others I never heard of). They have never touched FORTRAN, COBOL, LISP, PASCAL, Oberon, FORTH, APL, BLISS, SNOBOL, to name a few.

Once upon a time, exposure to multiple programming languages was MANDATORY in the computer science curriculum, and that didn't include counting C, C++, and Java as three separate languages.

Octal was used in the earlier days because it made reading binary instruction values easier. The PDP-11, for example, BASICALLY had a 4-bit opcode, 2 3-bit register numbers, and 2 3-bit access mechanism fields. Expressing the word in octal made everything obvious.

Because of C's early association with the PDP-11, octal notation was included, since it was very common on PDP-11s at the time.

Other machines had instruction sets that didn't map well to hex. The CDC 6600 had a 60-bit word, with each word containing typically 2 to 4 instructions. Each instruction was 15 or 30 bits.

As for reading and writing values, this is a solved problem, with a well-known industry best practice, at least in the defense industry. You DOCUMENT your file formats. There is no ambiguity when the format is documented, because the document TELLS you whether you are looking at a decimal number, a hex number, or an octal number.

Also note: If your I/O system defaults to leading 0 meaning octal, you have to use some other convention on your output to denote hexadecimal values. This is not necessarily a win.

In my personal opinion, Ada did it best: 2#10010010#, 8#222#, 16#92#, and 146 all represent the same value. (That will probably get me at least three downvotes right there, just for mentioning Ada.)

John R. Strohm
  • 18,043
  • 5
  • 46
  • 56
  • 11
    Downvoted you for mentioning Ada ... just kidding – kufi Jan 14 '14 at 15:05
  • 12
    I am curious to know how you came by the figure that *vastly more than 50%* of "language designers" -- why the scare quotes? -- have no experience with anything except C descendants. I spent a good sixteen years of my life talking with professional language designers every day and not one of them matches your description. – Eric Lippert Jan 14 '14 at 17:36
  • Javascript was described as "we want to do lisp in the browser" to get its designer interested, if I remember that interview correctly... – Izkata Jan 14 '14 at 19:14
  • 2
    @Izkata: Indeed, Waldemar Horwat once told me that he viewed JavaScript as essentially Common Lisp with a C-like syntax. In fact Waldemar defined a metalanguage, wrote an interprepter for his metalanguage in Common Lisp, and then wrote the JavaScript spec in his metalanguage, thereby enabling him to actually *run* the specification. It was a clever technique. – Eric Lippert Jan 14 '14 at 21:59
  • Why was 0 used though? Wouldn't it make sense to use a non numeric character? Did the ANSI standard just not have the foresight of bugs caused by numbers intending to be base ten? – Old Badman Grey Jan 13 '15 at 17:33
  • @OldBadmanGrey, the "leading zero == octal" design decision for C happened LONG before the ANSI standard. Why the original designers chose it, I don't know. The ANSI standard merely codified widely existing practice. – John R. Strohm Jan 13 '15 at 20:07
6

They get it from C. Why copy? Because the base implementation of all 3 is in C. Python's default implementation is CPython. Ruby was originally built in C as well. Javascript is the most interesting case here. It's run in the browser. Care to guess what the first web browser was written in?

So why would all three of these languages be implemented in C? Because they all originate on UNIX systems. So it's a case of convention being driven by ecosystem. Perl does this as well. Lua likely would if Lua used integers rather than doubles.

So it's a question of the environment of these languages being written in C so they take their conventions from C. A good supporting corollary is Visual Basic which uses &O instead. As far as needing it, it seems to be more of a leaky abstraction turned convention than anything else.

2

There is a value to consistency. If you can't reliably determine how a number is going to be translated you will have real problems using a value in different contexts.

It also means you don't have to write your own parser. There is great value in using well tested library routines.

Also if you don't support the leading 0 syntax you don't have a simple way to write octal values.

While we don't depend as much on octal numbers as we once did, they are still of value. While the same results can be obtained with hexadecimal numbers, in some contexts octal is easier to understand.

So far I have only seen one use for leading zeros in decimal numbers. That is in the display and entry of fixed length decimal fields like identification numbers. It has been years since I have seen fields like that with a leading zero. While this reduces the available values by 10%, it eliminates the problem that users often leave off the leading zeros when entering them.

BillThor
  • 6,232
  • 17
  • 17
  • 3
    The fields with leading zeroes are strings, visual representations with their own meaning, not numbers. – Pieter B Jan 14 '14 at 14:27
  • 1
    +1 consider `chmod 438 ./myfile` Terrible! – Ingo Jan 14 '14 at 17:41
  • 2
    Why not allow a `0o10` syntax? I believe Python supports that. You can always make a simple way to write octal values that doesn't make the number no longer a normal number. I have seen people try to use trailing zeroes in code for alignment and easy manipulation, and get bitten in the foot by octal notation – Manishearth Jan 15 '14 at 20:50
  • I can understand a desire to have a means of writing octal values, in addition to binary and hex. I can also understand that having a compiler interpret `031` as Halloween (Oct 31) rather than Christmas (Dec 25) could pose some risks if programmers copy in code written in a language which uses the latter implementation. I see no reason, however, why a language could not achieve the best of both worlds by supporting `0q31` as notation when octal is desired, or `0t025` for base-ten to allow macro-pasting values with leading zeroes, and simply forbidding leading zeroes without base specifiers. – supercat Mar 11 '14 at 23:32
  • @supercat Use of the leading 0 to represent octal date backs at least to your youth. The cases where it is frequently used are cases where the bits have significance, and interpretation of the number as a decimal number is only correct for values less than 8. Adding additional characters (including leading zeros on decimal numbers) may be more confusing than helpful. I would certainly query why a date was written 012 031, 010 031, or 012 025. – BillThor Mar 12 '14 at 02:41
  • @BillThor: The `0t` prefix would allow for situations where numbers are assembled from preprocessor macros. In embedded systems programming, it may be necessary to have a build date in a variety of formats; this could be handled nicely by having the build utility predefine macros for _BUILD_MONTH, _BUILD_YEAR, etc. but the behavior of base-10 numbers makes it necessary to have the build utility define separate macros for two-digit month and non-padded month (and likewise hour, etc.); a `0t` macro could ease such hassles when using token pasting. – supercat Mar 12 '14 at 06:24
  • @BillThor: Otherwise I'm well aware of what octal notation is useful for, and indeed have on rare occasions employed it myself. On those occasions, however, if the compiler would have accepted a prefix like `0q`, even if it was optional, I would have used it. My choice of particular constants for illustration was an attempt at humor ("why to programmers confuse Christmas and Halloween? Because 31oct is 25dec"), but my point was that the cost of bugs resulting from assigning *any* meaning for numbers with leading zeroes and no radix specifier is apt to exceed the cost of... – supercat Mar 12 '14 at 06:29
  • ...requiring a radix specifier for octal, especially in a new language. – supercat Mar 12 '14 at 06:30
  • @supercat How does 0t20140t120t31 get parsed? I would not expect the macro processor to parse the numbers so unless the _BUILD_MONTH got assigned to a numeric variable, leading zeros shouldn't be an issue. As a date string leading zeros should not be relevant. Requiring a radix specifier makes assembling a number from sub-strings of the number difficult. – BillThor Mar 12 '14 at 21:34
  • @BillThor: The same way as 0x123x3452x5 would [with a syntax error]. The issue with BUILD_MONTH is that in some cases it may be desirable to use it as a decimal number, in some cases as a hex number (so that it will get converted to BCD), and in some cases as a string (using the macro-stringizing function). The behavior of a 0t prefix would be analogous to 0x, except that the base would be ten rather than 16; the purpose would be so that if 0t##_BUILD_MONTH##_BUILD_DAY turns into 0311, that should be a value of 311, not 201. – supercat Mar 13 '14 at 00:47
  • @Manishearth .. besides it looking ghastly it's a great way to break old code. That should be obvious that if you require such a dramatic change it will break code. What you're proposing amounts to style policing which is never good. It's `0` which is octal by itself. That's the way it is. And it's very useful for a lot of things like bitwise operations. – Pryftan Apr 19 '23 at 16:06