7

I understand that this might be a loaded question. For example, in Sql Server 2008, doing DECLARE @someNum NUMERIC(10); means that the range of the variable will be limited by 10 digits. In other languages such as C, Java, etc. the standards for representing whole and fractional numbers are different. The range of the values is being sliced by multiples of bytes, and this is understandable, due to the way hardware works. A language could provide something that most hardware does not yet support, such as a 256-bit integer type, which will work fast on a 64-bit computer, if it is implemented correctly.

The numeric types in the databases seem to be not tied to underlying hardware. I understand that the values might need to be null-ablle, which can cost a bit or two, but why such arbitrary precision that does not seem to optimize storage or calculation speed?

Perhaps databases do not need to perform numeric computations often. Perhaps digit width of a column is a natural extension of character width. Perhaps SQL was supposed to be a non-programmers language (because it sounds like English :) )? Perhaps history took a weird turn, such as one that allowed the inferior VHS format to prosper.

Please help me understand this.

Job
  • 6,459
  • 3
  • 32
  • 54

5 Answers5

7

doing DECLARE @someNum NUMERIC(10); means that the range of the variable will be limited by 10 digits.

Actually, you're either very wrong in that assertion, or SQL server is hugely incompatible with other DB engines. Typically, you have:

  • NUMERIC(precision, scale)
  • NUMERIC(precision)

In the second case, scale is zero, i.e. you can only handle integers; precision is the maximum total number of digits with the specified scale (meaning integers in the second case).

By contrast, you seem to be reading numeric(10) as meaning "any number as long as there are 10 digits or less", for which there is no (best I'm aware anyway) SQL construct short of single/double float types (whose name differ per DB engine).


The numeric types in the databases seem to be not tied to underlying hardware.

Indeed, they're not. Numeric/Decimal is an arbitrary precision number, stored very much like a varchar of huge length (1000+ digits). You can actually use it to store 1e100 + 1 as an exact value. You'll get 1 followed by zeros, and finally a 1.

You sound like you need a float... The latter would store 1e100 when given 1e100 + 1, and would yield 1e100.

Denis de Bernardy
  • 3,913
  • 21
  • 18
  • Interesting, the Oracle [numeric](https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1832) datatype seems to exhibit this "floating point" behaviour, apart from supporting the expected arbitrary precision behaviour when precision and scale are explicitly specified – iruvar May 01 '18 at 17:26
3

Fixed-point numeric types are not used for the same purpose as floating-point numeric types.

Sure, floating-point types (which every SQL database also has) are optimized for the machine hardware, but they are also imprecise. Every day we get to see another newbie who doesn't understand this (not to pick on anyone, that was just the most recent example I found).

You pay a hefty price for the performance and space benefits that a floating point gives you; in particular, you end up with something that can never ever be used in any system that requires exact quantities, such as a financial system. Go ahead and try using floats for your invoicing component; the accountants will kick you to the curb after the 10th time in a row that the control process fails.

Integers are great for precise storage of quantities, but pretty miserable when it comes to computation. We sold 5 units yesterday and 6 units today; you want the average? It's exactly 5, according to the system!

There are situations where you must use fixed-point precision, even if it's not perfectly optimized. Generally, those situations involve either money or precursors to money (like discount rates), although anything dealing with real-world quantities is usually better off as fixed point as well - weights and measures are another obvious example.

Oh, and by the way, that silly VHS comment at the end? Nothing but a myth. So let that be a double lesson to you: Things happen for a reason. Vapid platitudes such as "VHS won because of marketing" or "SQL was originally supposed to be for non-technical users" may sound quaint and charming but they are rarely the whole story and are often outright fabrications.

Aaronaught
  • 44,005
  • 10
  • 92
  • 126
  • If you just sorted numbers from lowest to highest before adding them up, would not that help in both cases? – Job Jun 10 '11 at 01:59
1

This is speculative, but I can think of a couple plausible reasons:

  1. Relational databases use numbers as identifiers more frequently than as the objects of arithmetical operations. Efficiency of storage and comparison would be more important than fast arithmetic operations. Going from 16-bit right to 32-bit integers is a big jump and would result in needless file bloat (waste in a table = num numeric fields * num rows * number of unused bytes)
  2. Transact SQL is a pretty old language, developed originally for 16-bit processors. 16-bit integers are pretty weak tea for any sort of commercial/business used, particularly in light of point (1). The original T-SQL would have needed a more powerful and flexible integers than natively provided. This may be driven by backwards compatibility.
Charles E. Grant
  • 16,612
  • 1
  • 46
  • 73
1

I think your notion that a value in a column is like a variable is not accurate. SQL Server (and other databases) stores data in terms of pages (8 KB for SQL Server). Each page will have as many rows of data as will fit (plus some overhead). So SQL Server storage really doesn't care about each column type and stores raw bytes.

Check this link for more information
http://msdn.microsoft.com/en-us/library/cc280360.aspx

softveda
  • 2,679
  • 1
  • 21
  • 21
0

In other languages such as C, Java, etc. the standards for representing whole and fractional numbers are different.

In C and various programming languages, integers (INT() in SQL) are usually "packed" integer bit strings and floating point numbers (FLOAT()/DOUBLE() in SQL) are usually stored as IEEE-754/-854-compliant bit strings.

DECIMAL() and NUMERIC() typically use Binary Coded Decimal (BCD).

A packed, signed, 32-bit integer will store -2-billion-or-so to +2-billion-or-so. You can reliably use 9 digits, +/-, but not 10. For more than 10, you need the next step up in integer, which will likely be 64-bits.

IEEE-754 deals with storing floating point numbers in a 32-bit sequence. It stores:

  • one bit for the sign
  • multiple bits for a relative exponent
  • multiple bits for a mantissa

In this fashion, it holds several significant digits and a range of exponents. The order of the bits is such that you can treat them as 32-bit signed integer, for comparison purposes, and you should get the right answer. This way, you don't have to have different ways for comparing integers vs floats. IEEE-854 takes this to 64-bits, allocating more bits to the relative exponent and the mantissa, allowing more digits of precision and a greater range of exponents.

The problem with these floating point values is that they are ALL approximations. And they (at least, the older, more commonly supported formats) are all based on exponents of 2, not exponents of 10. Ergo, there is no EXACT representation, in either one, for 1.1 or 1.01; there's no exponent of 2 which you can multiply either of those by and get an exact value. If you deal with financial transactions, you don't want an approximation. If you must round, you want to deal with exponents of 10, not 2, because that's how we denominate money.

DECIMAL() and NUMERIC() are different animals, using BCD. If you define a DECIMAL(7,2), that will store a total of 7 decimal digits, keeping track of the fact that 2 of them are behind the decimal point. That's 7 significant digits, no approximation. That will likely fit in 32-bits of space (4 bits for each digit + 4 bits for the total field, to store sign, null, etc.). If you multiply two DECIMAL(7,2) fields, you know PRECISELY, down to the penny, what you will get. No approximations.

And the aforementioned DECIMAL(10) will likely only need 48 bits of space to store. And you can use all 10 digits, positive and negative.

Perhaps databases do not need to perform numeric computations often. Perhaps digit width of a column is a natural extension of character width. Perhaps SQL was supposed to be a non-programmers language (because it sounds like English :) )? Perhaps history took a weird turn, such as one that allowed the inferior VHS format to prosper.

Afraid I have to disagree with you, there. The databases I deal with do a LOT of numeric calculations. Most of it, though, is dollars and cents, which largely REQUIRES the precision of BCD and the associated DECIMAL(x,y) formats. The fact that BCD chews up 4 bits / digit, while ASCII used 8, is mostly coincidental.

The invention of SQL is credited to E. F. Codd, who was working at IBM. No surprise, therefore, that the formatting for DECIMAL(x,y) is largely stolen from FORTRAN.

Meower68
  • 466
  • 4
  • 5