8

I am currently learning about compiler construction and language design and I am thinking about what native datatypes I want to support in my language. Now there is a whole lot of languages that make a distinction between integer- and real numbers. However, I remember watching a talk by Douglas Crockford in which he said:

Having a single number type in the system means that you can not make a bug by choosing the wrong number type

He also mentioned that he recommends a number representation different to the commonly used IEEE-754 (please correct me if I am wrong), namingly being the DEC64. Hence my question: For a general-purpose language which has a primarily educational focus, what number representation should I use?

EDIT: With educational focus I am talking about my own progress in learning about compilers, not to educate others.

Niklas Vest
  • 211
  • 1
  • 5
  • 4
    There is no right answer here. For educational purposes, I'd recommend using arbitrary precision decimals (e.g. Javas BigDecimal and BigInteger). Has all flexibility one can ask for, avoids having to explain rounding errors right at the start and behaves very much like the calculators pupils are used to. – marstato May 29 '18 at 06:46
  • 5
    The question starts asking about number *types* then veers to number *representations*. Which is kind of confusing, possibly also confused. Further, even opinion-based answers are likely to be off the mark if you can't clarify the educational focus you mention: is it to be a language for people learning to program (such as Pascal), a project for your own self-education, a language to be used by tots learning counting, ... ? – High Performance Mark May 29 '18 at 07:23
  • 1
    Thanks, I edited the question. I am mentioning both the number TYPES and REPRESENTATIONS because I firgured maybe someone could point me to the right direction for both of my problems :) – Niklas Vest May 29 '18 at 12:15
  • Just remember, being able to eliminate one class of bugs does not necessarily mean your net number of bugs goes down. – whatsisname May 31 '18 at 20:30
  • 1
    "which has a primarily educational focus" Like computer science educational? Because one issue you'll run into from a language design standpoint is supporting binary operators like arithmetic and logical shifts. (This plays into operator precedence grammar rules also if you include binary operators to learn how languages handle things). If you don't care about those then you could use a single data type. (JS has a single "Number" type if you want to see a weird way to handle things). – Sirisian May 31 '18 at 23:23
  • The question is, who is being educated, for what purpose? For something similar to Logo or Scratch, then sure: a single numeric type, probably corresponding to "double" in a typical C implementation, is a sensible approach. For a CS undergraduate, learning to use different representations is part of what they're learning, so a variety of types would be best. – Jules May 31 '18 at 23:43
  • @Sirisian For some reason my first edit wasn't applied so I edited my question again. I do not want to educate anyone else but me. I want to learn how to build festures into languages etc and I am currently reading about compiler generators. However I am pretty curious how you would implement your own number representation so I am still contemplating whether I should use the standard number distinctions that would come "built in" with the generator or use a non-standard representation just for the sake of learning. But I also don't want my language to be useless because of a design mistake. – Niklas Vest Jun 02 '18 at 05:41

2 Answers2

11

We have different number representation in general because they have different strengths and weaknesses, be it speed, precision, or range. Also this has to be the case because we cannot represent all Real numbers with finite memory, we always have to choose some that we cannot represent exactly.

The Doug Crockford quote you have is borderline idiotic, if you can only pick one representation then, OK you can't pick the wrong one, but you can't pick the right one either. i.e. your only choice will work for some uses but not for all.

It is true that some representations are probably better as first goto choice DEC64 looks reasonable here. It's a decimal floating point representation, so it will be less surprising than IEEE-754 (which is binary floating point) in most situations as people tend to think in decimal e.g. it can represent 0.3 exactly. It will still have representation issues in some circumstances e.g. adding really big and really small numbers together

for further reading I'd suggest Richard Harris' series of articles 'why X wont fix you flaoting point blues'

jk.
  • 10,216
  • 1
  • 33
  • 43
  • links to Harris' articles can be found in this answer to a related question https://softwareengineering.stackexchange.com/a/101197/10563 – jk. May 29 '18 at 11:35
3

For a general language, numbers should behave like those taught in math class. Only special-purpose languages, like those for device drivers, should have special mathematics.

I would recommend using arbitrary-precision numbers rather than fixed-precision ones. Yes, they're slower but they behave like people expect number to behave. Placing artificial limits on them will be reported as a bug.

shawnhcorey
  • 219
  • 1
  • 7
  • 6
    Unfortunately, numbers like those in math class cannot be represented in finite memory, which is the whole reason why we have so many different number representations in the first place. – Jörg W Mittag May 29 '18 at 15:53
  • 5
    And reporting "out of memory" when someone tries to calculate SQRT(2.0) will probably also be regarded as a bug. – Simon B May 29 '18 at 16:21
  • 1
    @JörgWMittag No. At one time, CPUs only did integer arithmetic. So compilers separated fast integers from slow floats. Given that CPUs were much slower then and memory was small, having a choice made sense. But it doesn't any more. – shawnhcorey May 30 '18 at 11:59
  • @SimonB Calculations of transcendental functions are stopped when the difference between the previous result and the current one are less than a small amount. This is true even with fixed-precision numbers. And arbitrary precision does not mean infinite precision. It means the programmer gets to decide, not the compiler. – shawnhcorey May 30 '18 at 12:04
  • 3
    So, your single number type is in reality an infinity of number types depending on a few parameters. – Deduplicator May 31 '18 at 20:34
  • @Deduplicator No, as far as the compiler is concerned, they're all the same data type. – shawnhcorey Jun 01 '18 at 00:57
  • 2
    @shawnhcorey: So, you want to require manually specifying the precision for every single operation instead? – Deduplicator Jun 01 '18 at 00:59
  • @Deduplicator There is a default that you can change. If you don't like the given default, you can set another default. And you can set the precision for each variable. It's not much hard than deciding to use int, long, float, or double. If fact, it's easier since you can just use the default precision. – shawnhcorey Jun 01 '18 at 09:50
  • 2
    @shawnhcorey Global state is evil. And now it interferes with all arithmetic? Also, how would two variables with different parameters interact? – Deduplicator Jun 01 '18 at 12:09
  • @Deduplicator Global constants are not evil. – shawnhcorey Jun 01 '18 at 17:24
  • @shawnhcorey they are when they silently also affect libraries – user253751 Jul 06 '20 at 12:51