Why are structs/arrays zero based?

Question

How was it decided that if you have an array/struct or anything similiar in a programming language it should be zero-based? Wouldn't it have been easier if it was 1-based. Afer all, when we are taught to count, we start with one.

+1 for asking a question most thinking developers wonder about but didnt bother to ask. — DPD, Apr 23 '11 at 02:37
*Afer all, when we are taught to count, we start with one.* - this is exactly the problem with basic math education, they are on a pre-Peano/Frege/Dedekind level still. — Ingo, Apr 23 '11 at 12:10
@rwong There's no thing like binary number. There are just numbers and their representations. Unsigned integers starts at 0. — maaartinus, Apr 24 '11 at 11:12
@maaartinus: yes you're right. In any base-n numeral system, a string of zeros (in base-n) will give you zero, the lowest non-negative number. Zero is also the additive identity, so it is required to be present for calculations. If, however, a programming language choose to start with one, it's just as simple - skip the element [0]. — rwong, Apr 24 '11 at 20:04

score 34 · Accepted Answer · answered Apr 23 '11 at 02:56

34

All good answers. A good part of my "career" was spent in Fortran, where all arrays are 1-based. It's OK if you're writing math algorithms over vectors and matrices, where indices naturally go 1 .. N.

But as soon as you start trying to do computer-science type algorithms, where you have a big array and you are working on pieces of it, as in binary search, or heap sort, or if it is a memory array and you are writing memory allocation and freeing algorithms, or starting to act like parts of it are actually multidimensional arrays that you have to calculate indices in, that 1-based stuff gets to be a real source of confusion.

For example, if you have a 1-dimensional array A, and you want to treat it as a 2-dimensional NxM array, where I and J are the index variables, in C you just say:

A[ I + N*J ]

but in Fortran you say

A( (I-1) + N*(J-1) + 1 )
       or
A( I + N*(J-1) )

If it was 3-dimensional, you had to do

A( I + N*(J-1) + N*M*(K-1) )

(That's if it was column-major order, as opposed to row-major order which is more common in C.)

What I learned to do in Fortran, when doing string manipulation algorithms, was never to think of an index I as being the position of an element in an array. Rather I would think of a "distance" N as being the number of elements coming before the element of interest. In other words, always think in terms of "number of elements" rather that "index of element". That enabled me to work within what was an unnatural indexing scheme.

answered Apr 23 '11 at 02:56

Mike Dunlavey

12,815
2
35
58

15

+1 - the bottom line is that indexing from zero results in simpler programs overall. Once programming language design stopped being overly influenced by mathematical convention, common sense prevailed. – Stephen C Apr 23 '11 at 03:19
2

@Mike On the other hand, with 0-based arrays you **always** have to do `a[pos-1]` to get what you want. The only case where 0-base feels natural is when looping, but that's only because the idiom `for(i=0;i – Martin Wickman Apr 23 '11 at 09:29
In my opinion, this is the "correct" answer, but there are trade-offs. One trade-off, as you mention, is its use in 1 dimensional vs. multidimensional arrays. Base 0 indexing if of huge benefit when calculating multidimensional indexes. In higher-level programming, however, the use of 1 dimensional arrays, however, vastly outweighs the usage of multidimensional arrays. Further, the complexities of multidimensional indexing can be written once and encapsulated within a method (or indexer). *[Continued...]* – Mike Rosenblum Apr 23 '11 at 14:37
1

In low level programming, however, say asm or C, where performance is at a premium and inlining might not be available, the use of encapsulation might not be desirable or possible; so the benefits of base 0 indexing really manifests itself. So base 0 indexing is superior at the low end, while base 1 indexing is superior at the higher, more abstracted level. *[Cont...]* – Mike Rosenblum Apr 23 '11 at 14:37
1

The problem is, where does one make the switch? The switchover from base 0 at a low level to base 1 at a higher level is extremely dangerous and fraught with innumerable places to forget to add or subtract 1 when making a conversion. So the bottom line is that it's not worth the risk of making the shift. Usually the "shift" occurs only in the output, where the user is shown a list that is (hopefully) numbered 1 to N and not numbered 0 to N-1. (But I've seen plenty of displays that mess this up!) – Mike Rosenblum Apr 23 '11 at 14:37
Addressing arithmetic is easier with zero based, the array pointer points to index zero. It also works naturally with the mod function, for example if a[n] is meant to be a periodic array I can address it as a[j%n] quite naturally. However I don't know how many errors have happened when people mistakingly address a[n]. I prefer the Fortran approach, where you can specify the index as a range, for example dimension a2d(10:20,-5:5). To do that in C requires some tricky address algebra. – Omega Centauri Apr 24 '11 at 15:52
@Omega: When you say a[j%n] just make sure j is non-negative. It's been an issue for a long time that if j is negative you get a negative mod. In graphics, this can be really aggravating. – Mike Dunlavey Apr 25 '11 at 14:36
Thanks Mike. I've mainly been a Fortran programmer, and thats how the Fortran mod operator works, so its not surprising C works the same way. – Omega Centauri May 01 '11 at 00:17

score 31 · Answer 2 · edited Apr 23 '11 at 03:15

31

Think of an array index just as an offset from the start.

edited Apr 23 '11 at 03:15

Adam Lear

31,939
8
101
125

answered Apr 23 '11 at 02:06

mouviciel

15,473
1
37
64

5

That's the real reason. In languages like C the index of an array is actually kind of an offset. E.g. the offset is the index times the data type length. With 1-based arrays this would be much more complicated. – EricSchaefer Apr 24 '11 at 06:56
Yes. I think C started with the mentality of an assembly programmer. And in assembly it was convenient to use a pointer to the adrress of the zeroth element, and add size times the index to that. – Omega Centauri Apr 24 '11 at 15:55
2

I think the main reason C was able to adopt and maintain this convention is that it did not have a marketing department. – Mike Dunlavey Apr 24 '11 at 22:06
This should be the accepted answer. – EJoshuaS - Stand with Ukraine Feb 03 '23 at 15:41

btilly · Answer 3 · 2011-04-24T05:58:53.437

23

Dijkstra answered this very clearly in http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html - though Pascal programmers didn't agree.

edited Apr 24 '11 at 05:58

answered Apr 23 '11 at 02:09

btilly

18,250
1
49
75

12

answered, yes. clearly, no. – Steven A. Lowe Apr 23 '11 at 02:11
16

I think Dijkstra's eplanation suits fine. The reason people get peeved at zero-based indices is that they count from 1 and feel that indexing should also start from 1. They fail to understand that indexing is not the same as counting. – DPD Apr 23 '11 at 02:41
tl,dr: `p, p+1, ..., p+q-1` is best represented by `p ≤ i < p+q` and is best indexed by `0, 1, ..., q-1`. These indexes are best represented by `0 ≤ i < q`. – Wok Apr 23 '11 at 10:41
4

Although it's rarely profitable to disagree with EWD, I reject his assumptions as well as his conclusions. In his example, I prefer C) much more than A). – red-dirt Apr 23 '11 at 12:50
Dijkstra's conclusions follows from the assumption that connoting the empty set with [0..0] is better than with [0..-1]. On this, I think, he is wrong: having the upper bound *lower* than the upper bound much more clearly denotes an empty set than having the same number twice. It's the old inclusive vs. exclusive between issue, and [0..-1] is 100% clear in either convention. Further, being inclusive or exclusive on both ends *consistently* makes more sense than being >=0 and 0 and =0 and <=N-1, or (c) >=1 and <=N. Choice 'c' is clearly the winner. – Mike Rosenblum Apr 23 '11 at 14:31
3

@Mike: I disagree that [0, -1] is 100% clear. It definitely wasn't intuitive to one of my co-workers who wrote an HTTP server that needed to deal with empty ranges. – dan04 Apr 23 '11 at 15:14
2

Furthermore, -1 as the lower bound of an empty array forces the use of signed rather than unsigned indices. – dan04 Apr 23 '11 at 15:16
There's a counterpart to dan's concerns at the positive end as well - your indexing type must be larger than your allocation type (in C++ terms, index_type larger than size_type; in C terms inexpressible because size_t is both) - or your index type must be some special type that cannot store 0 at all - neither of these are particularly desirable. – Apr 23 '11 at 16:31
@dan: "I disagree that [0, -1] is 100% clear." Well, my point was that a lot of the confusion, and much of Dijkstra's discussion, revolves around the use of inclusive or exclusive bounding at either or both of the end points. The example [0..0] is not an empty range if bounded on an inclusive basis; however, [0..-1] clearly denotes an empty set regardless of whether one is using inclusive or exclusive bounding on either or both ends. – Mike Rosenblum Apr 23 '11 at 17:06
1

@dan: "Furthermore, -1 as the lower bound of an empty array forces the use of signed rather than unsigned indices." Yes, exactly, this is why, instead of using base 0 indexing, e.g. [0..N-1] and using [0..-1] as the empty set, one should instead use base 1 indexing, e.g. [1..N], where [1..0] represents the empty set. Use of base 1 gives us a natural counting order 1..N and allows us to to use an unambiguous empty set [1..0] that avoids the use of signed numbers. – Mike Rosenblum Apr 23 '11 at 17:11
@Mike: At what level of abstraction does one integer type being required to be larger than another not matter? It has nothing to do with how the integers are stored, and everything to do with extremely high-level concepts of types. – Apr 23 '11 at 17:37
@Joe: "There's a counterpart to dan's concerns at the positive end as well - your indexing type must be larger than your allocation type". I agree with you here, but this is a low-level issue that does not matter at higher-levels of abstraction. Dijkstra's argument was at an abstract level, but, I think, hinges too tightly to a debatable assumption. Your argument is 100% correct, but has to do with low-level implementation details that fit much more in line with Mike Dunlavey's answer, and associated comments, above. – Mike Rosenblum Apr 23 '11 at 17:37
@Joe: "At what level of abstraction does one integer type being required to be larger than another not matter?" I was trying to convey that Dijkstra was arguing at a very high, very abstract level -- really more at the math level than anything having to do with implementation. I agree with your point 100%, demonstrating that implementation details cause base 0 indexing to be superior. But at the high-level, discussing whether having a collection be indexed base 0 vs. base 1, I don't think we care what type is contained in the collection or how it's size is allocated or indexed in memory. – Mike Rosenblum Apr 23 '11 at 17:45
@Mike: My point is independent of implementation details except to say that there must be a concrete implementation, i.e. we do not have all integers at our disposal for an alphabet. Even machines as abstract as Turing machines fit this requirement, without talking about "memory" or "indexing" or "collections". Relational closure is a thing that is important to have for _informal human reasoning_ as well as _efficient computation_ and _mathematical proof_. One-based indexing requires giving up this closure. – Apr 23 '11 at 18:52
Hey Joe, ok, ok, I believe you. :-) But you're operating at a level that is way over my head. I looked up "Relational Closure" and I can't say that I understood it. I'm sure that it's too tough to explain or discuss in a comment, so would you know of a source that would be relatively easy to digest? (Somehow, I doubt it exists!) – Mike Rosenblum Apr 23 '11 at 19:26
Funny, I read that article before never realized it was Djikstra. – Michael Brown May 16 '12 at 16:13

score 10 · Answer 4 · answered Apr 23 '11 at 02:08

10

The difference is that it's not a human counting, it's the computer. It is easier for computers to think of 0 as the first item, as it is usually just an offset from the memory location. It's logical to start at 0000, then 0001, then 0010. If you started at 1, you would either lose available index (acting like 0000 isn't valid) or have awkwardness to make sure that the compiler always knows it should decrement the index by one before it actually works.

Plus it isn't that hard to pick up after your first programming class and you are told this is the way things work.

answered Apr 23 '11 at 02:08

unholysampler

7,813
1
30
38

9

*It is easier for computers to think...* - really??? Programming languages are designed for human comprehension first. A compiler can easily handle the detail of 1-based vs 0-based arrays. – btilly Apr 23 '11 at 02:10
11

@bitlly: 0-based indexes date back to when the extra decrement required to implement 1-based array lookup was a significant performance hit. – Bevan Apr 23 '11 at 03:04
4

@Bevan: Why would there be an extra decrement with 1-based array lookups? The trivial solution is of course to take the indexing into account in defining your base pointer, thus removing altogether any requirements for extra decrementation. – Schedler Apr 23 '11 at 06:02
3

@Schedler that only works if you know in advance what the pointer is going to be used for. Unfortunately, this only works when you have strong typing, which you don't really have in C or assembly language. Throw in the fact that you don't normaly have the luxury of keeping the base pointer around (due to lack of registers and/or stack space), and you end up with having to recalculate the address every time. – Bevan Apr 23 '11 at 09:36
4

@Bevan: Citation needed. Also please explain why FORTRAN, invented decades before any currently used 0-based language, managed to get away with 1-based array indexes. – btilly Apr 24 '11 at 06:13
Citation? Comments on my own experience. Assembly language programming on the 6502 & Z80 chips in the late 1980s, some SPARC assembly and C programming in early 1990s. As for FORTRAN, that was a language (one of the first, IIRC) explicity designed for non-computer-specialists to use - in contrast to the C language where performance was part of the motivation (see [The C Programming Language, Brian Kernighan and Dennis Ritchie](http://en.wikipedia.org/wiki/The_C_Programming_Language). – Bevan Apr 24 '11 at 08:30
As to "decades before any currently used 0-based language" ... The C programming language book was published in 1978. C was based (indirectly) on the BCPL language (1966) which had 0 based arrays (IIRC). Fortran started in 1956, but didn't gain arrays until the Fortran-IV standard, also 1966. Even if I'm off by a year or two (writing this from memory), the two styles of indexing developed at the same time, not decades apart. – Bevan Apr 24 '11 at 08:44
1

If you're starting your index at `0001` instead of `0000`, and you've got, say, 8 values, aren't you going to need to pull an extra bit? `10000` would be your '8th' index, instead of `1111`. It's a really minor thing, but in old computers could make a difference. – KChaloux Oct 11 '12 at 13:08

score 10 · Answer 5 · answered Apr 23 '11 at 02:17

10

In mathematics for centuries the subscript of a series has been chosen for convenience and meaning. For example in a polynomial, the coefficients are usually labelled a0, a1, a2, a3, etc. because the zero represents the power of the corresponding term. In computer science, the subscript represents the offset relative to the beginning of the array.

answered Apr 23 '11 at 02:17

Rick Sladkey

671
5
9

Yes, but that means in C you have to remember to dimension your coefficient array by n+1, which is oh so easy to forget. – Omega Centauri Apr 24 '11 at 15:56
It helps to remember to index by the offset but allocate by the length. Alternatively the length is just the offset to the first unused element. – Rick Sladkey Apr 24 '11 at 20:29

score 6 · Answer 6 · answered Apr 23 '11 at 11:46

I see two reasons:

The low level one. If you start with 0, than the pointer element indicated by index is pointer of array + index (a[i] and a + i point to the same memory address). This is very convenient.

On a bit higher level of abstraction -- very often you will have to use modulo function for indices. Modulo n always returns values from 0 to n-1. So it's also more convenient when indices start with 0.

score 5 · Answer 7 · answered Apr 23 '11 at 02:10

From a modern standpoint (i.e. interpreted languages or recently developed languages such as C#) this is likely due to developers and thus language designers learned to develop since languages such as C make use of zero indexed arrays. As to why languages have made use of the zero index for arrays, the index is due to how the array is stored. In C, the use of the array index array[1] is the same as using the pointer reference, i.e. array + 1. Wikipedia also has a bit more on the subject, but that is one reason in a nutshell.

score 4 · Answer 8 · answered Apr 23 '11 at 06:10

4

As has been pointed out, not all languages use 0-based indexing. For example in Ada you can define your indexing basis freely. For String type for example 1-based indexing is used.

The benefit of this is that you can define the indexing to best match the intended usage. In some cases it might sense to for example define the array bounds as -1..+1.

For example a symmetric 5x5 kernel for convolution could be defined as

type KernelT is array (-2..+2, -2..+2) of Real;

Similary enumerated types can be used directly for array indexing.

answered Apr 23 '11 at 06:10

Schedler

519
3
6

1

I have seen this (ab)used to create some _very_ unclear code (admittedly, in Algol 68 but the principle is the same). What was worse, the code looked clear at first glance; it was only when trying to convert into a different language that it became clear that things were obscure… – Donal Fellows Apr 23 '11 at 14:32

score 3 · Answer 9 · answered Apr 23 '11 at 11:24

The simple answer is zero is simpler when one tries to calculate the actual address in memory. If an array starts at location 12345 and you wish to access element 678 then adding the two gives the location that needs to be read. Note the above assumes the array holds bytes and is of course bigger than 678. If you used 1 based indexing then an additional subtract 1 would be required. This quickly becomes a pain andall things said 1 does not really buy anything.

Why are structs/arrays zero based?

9 Answers9

Linked