If you had to represent a large number concisely would you use base 36 or ZZ?

Question

According to Wikipedia:

In mathematics and computing, hexadecimal (also base 16, or hex) is a positional system that represents numbers using a base of 16. Unlike the common way of representing numbers with ten symbols, it uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values zero to nine, and "A"–"F" (or alternatively "a"–"f") to represent values ten to fifteen.

Using two of those symbols you can get 256 numbers out of it. The largest number would be represented as FF.

If I wanted to represent a much larger number could I use base 36 or ZZ? If this is possible why hasn't this been used before?

More info:
I'm in a scenario where I'm indexing items with unique IDs and I'd like to be able to represent around 1000-10000 items on average and if I can I'd like to use two symbols.

Also, where do web colors fit into this?

White is represented as #FFFFFF. That being full red (FF) full blue (FF) full green (FF). Is that easier to read than using a different base encoding?

You mean like [Base64 encoding](https://en.wikipedia.org/wiki/Base64)? — geocodezip, Nov 09 '19 at 22:49
Why bother using two symbols for this when you could use just one Unicode code point and easily cover your given range many times over. — Dave M, Nov 09 '19 at 22:50
*"If this is possible why hasn't this been used before"* - don't be astonished if your question contains such wrong assumptions, it will be downvoted for not having done enough research before. — Doc Brown, Nov 09 '19 at 22:53
Because O and 0 as well as l and 1 are easily confused by people who aren't sure how they look in any particular font. Humans are more used to using context to resolve this. Something they don't have if every symbol is a number. — candied_orange, Nov 09 '19 at 23:06
@DaveM Can you explain more? You're saying instead of stopping at Z using ASCII characters use Unicode characters? — 1.21 gigawatts, Nov 09 '19 at 23:44

score 4 · Accepted Answer · answered Nov 09 '19 at 23:25

The goal of hexadecimal encoding is not to encode larger number in fewer digits, but to have an easy mapping between digits and bits in a byte: 2 hex digits correspond to a byte. And 1 hex digit is 4 bits so half a byte (I use byte in the sense of an octet).

You can of course use a base 36 digit(0..9 A..Z) to encode larger numbers in fewer digits. With two such digits, you can then encode 1296 position (36 to power 2 digits). With 3 digits you can encode 46656 values.

You could even decide to have case sensitive digits and encode a digits in base 62. With two such digits you can encode 3844 values. With three digits it’s 238 328 values.

Using every printable ascii chars you can go to base 92. For 10000 entries, you’d still need 3 digits.

But if you would use normal decimal number but encoded as binary, you’d just need 2 bytes.

How are you getting these numbers? Base 36 to the second power? — 1.21 gigawatts, Nov 09 '19 at 23:46
@1.21gigawatts If you have B possible values/combinations for one digit, then for N such digits, you get pow(B, N) possible combinations. So for B=36 and 2 digitss, thats 36x36=1296. Conversely, if you want to achieve K possible values you’d need log(K)/log(B) digits. You need of course to round up to take into account that there is no sense in having fraction of a digit. — Christophe, Nov 10 '19 at 00:06

score 3 · Answer 2 · answered Nov 10 '19 at 00:59

3

Your main misunderstanding is about the purpose of BaseX formatting.

As others pointed out BaseX (with X being a high number) does not increase information density. You cannot get higher density on an inherently binary machine by changing the representation of bits. If you have a lot of bits you can apply compression though which will save you some space depending on the sort of data at hand, at the cost of processing speed.

The most commonly used BaseX format is Base64 and its purpose is to transfer binary data as printable characters, specifically in an email message. This is not the most dense way to encode data but it will fit nicely in the body part of an email message without disturbing the mail processor with accidental control codes because everything is just text.

answered Nov 10 '19 at 00:59

Martin Maat

18,218
3
30
57

Understood. In my case I'm using it for creating a set of unique IDs. If `0-9azAZ` make 58 what are the other characters that make it 64? – 1.21 gigawatts Nov 17 '19 at 23:13
1

@1.21 0-9a-zA-Z makes 62 actually. Base64 is not a standard, anyone can pick 64 printable characters and call it Base64. There are common/popular implementations though (you probably do not have to write the code yourself). .NET's Convert.ToBase64String uses A-Za-z0-9+/ plus = as a valueless character used for trailing padding. I am not aware of other implementations but they may use different characters. – Martin Maat Nov 17 '19 at 23:43

score 1 · Answer 3 · answered Nov 09 '19 at 23:11

It isn't done on today's computers because merely working in a different (apparent) number base doesn't increase the information density or processing performance.

First, computers don't use decimal or hex — those are formats fit for human consumption, i.e. printing, input/output. Internally, and physically, computers use binary digits to store, represent, and manipulate values, whether they communicate them in decimal or hex, fixed or floating point. To store a bigger number, we use more bits!

So, printing numbers out in base 36 does not change their internal representation as binary digits using logic in some process technology of the day.

Some NAND flash drives use MLC, which allows encoding 2 bits in one logical cell. And this does improve the physical density allowing more storage in the same volume. However, there is a trade off in that these can be slower and suffer in longevity as well.

In some sense, the answer has to do with information theory, which is the study of encodings in physical systems.

There were vacuum tube systems in the very old days that used base 10 directly — each tube could store 10 different values, corresponding to decimal digits.

However, several forms of simplification have helped drive miniaturization, two of these are: (1) using binary instead of multi-valued logic devices, and, (2) using a single logic gate (like a NAND gate) instead of trying to combine AND, OR, and NOT logic gates in a single technology process.

If you had to represent a large number concisely would you use base 36 or ZZ?

3 Answers3