(This part of the question was associated with revision 2 and is still quite applicable to that problem)
The standard approach to doing this is to populate an array with the appropriate weights:
[x, x, x, x, x, x, y, y, y, y, z, z]
1 2 3 4 5 6 1 2 3 4 1 2
and then pick one at random.
Please note that random is random. It is very unlikely that you will ever a perfect distribution.
If you want to guarantee the distribution, you need to do (mostly) away with the random.
Instead, shuffle the above array. For example, one shuffling would give you:
[x, y, z, y, x, x, y, x, z, y, x, x]
And then loop over the array again and again returning the current index. It isn't random anymore, but it will guarantee the distribution remains what you are looking for always. This is just a slightly more 'random' version of a round robin distribution.
If this is attempting to load balance something, the round robin is a better approach and you might want to consider a evenly distributed round robin:
[x, y, x, y, x, z]
This has the same distribution as above and tries to keep everything at an even distance so not as to saturate any one resource.
As an aside, to the question of rand
being poor, you may be dealing with an older standard library. From rand3 man page
The versions of rand() and srand() in the Linux C Library use the
same random number generator as random(3) and srandom(3), so the
lower-order bits should be as random as the higher-order bits.
However, on older rand() implementations, and on current
implementations on different systems, the lower-order bits are much
less random than the higher-order bits. Do not use this function in
applications intended to be portable when good randomness is needed.
(Use random(3) instead.)
This can be demonstrated using the following C program that spits out the consecutive low byte and low nybbles from rand()
. I happen to have a more modern system and don't have access to anything that might demonstrate this low order less randomness.
You can test it on your system with the following code which looks at the low byte, low 4 bits, and low 2 bits.
#include <stdlib.h>
#include <stdio.h>
int main (int argc, char** argv) {
int i, r;
int a8, a4, a2;
srand(0);
FILE *fp0xff = fopen("low8.out", "w");
FILE *fp0x0f = fopen("low4.out", "w");
FILE *fp0x03 = fopen("low2.out", "w");
for(i = 0; i < 10000; i++) {
/* 1 of 4 */
r = rand();
a8 = r & 0xff;
a4 = r & 0x0f;
a2 = r & 0x03;
fprintf(fp0xff, "%c",a8);
/* 2 of 4 */
r = rand();
a8 = r & 0xff;
a4 = (a4 << 4) | (r & 0x0f);
a2 = (a2 << 2) | (r & 0x03);
fprintf(fp0xff, "%c",a8);
fprintf(fp0x0f, "%c",a4);
/* 3 of 4 */
r = rand();
a8 = r & 0xff;
a4 = r & 0x0f;
a2 = (a2 << 2) | (r & 0x03);
fprintf(fp0xff, "%c",a8);
/* 4 of 4 */
r = rand();
a8 = r & 0xff;
a4 = (a4 << 4) | (r & 0x0f);
a2 = (a2 << 2) | (r & 0x03);
fprintf(fp0xff, "%c",a8);
fprintf(fp0x0f, "%c",a4);
fprintf(fp0x03, "%c",a2);
}
fclose(fp0xff);
fclose(fp0x0f);
return 0;
}
The output of this program can then be examined using ent which runs a number of tests of randomness against a stream of bytes.
Running this against the low2.out on my system produces:
Entropy = 7.982762 bits per byte.
Optimum compression would reduce the size
of this 10000 byte file by 0 percent.
Chi square distribution for 10000 samples is 240.46, and randomly
would exceed this value 73.46 percent of the times.
Arithmetic mean value of data bytes is 127.6318 (127.5 = random).
Monte Carlo value for Pi is 3.150060024 (error 0.27 percent).
Serial correlation coefficient is -0.002842 (totally uncorrelated = 0.0).
Which is frankly, a reasonably good random stream.
The reason I am testing the low two bytes is that you are doing things with % 9
which works only with these low bits. You may find that your random number generator is old and might need to implement your own (if that degree of randomness is something that you need to work with).
One approach would be, well, to implement your own. The Mersenne twister is quite well regarded and you can find implementations for a wide range of languages quite easily.
The other thing to do would be to shift the random number you get down 8 bits to get rid of any low order less randomness and use those bits.
You can get an idea of the poor quality of rand in some generators from Random.org.
For, example, rand()
called by php on Windows produces the following:

(from Random.org)
Which is quite certainly, not a good random.
There's a bit more to this and math that can cause some uneven distribution even with even distributions.
The code:
double r = rand() / (double)RAND_MAX;
rand()
returns a number in the range [0,32767]
. This range isn't evenly divisible by 9 (or 3). Thus, some values will be over sampled. Assume we are dealing with % 100
. The values from [0,67]
will be found more frequently than the values from [68,99]
. This is introducing a bias to the random which is problematic.
Yep. The code isn't using % n
instead uses a floating point. Now you have 2.000001 problems.
You've got code that is approximately:
int dst = (rand() * 1.0 / RAND_MAX) * 99
This can be described as hilariously non-uniform. There is only one input that can produce the maximum value because of truncation.
(32765 * 1.0 / 32767) * 99 = 98
(32766 * 1.0 / 32767) * 99 = 98
(32767 * 1.0 / 32767) * 99 = 99
No matter how you try to fix this with rand, you will have this problem you will run into the pigeonhole principle which states that if you try to put n items into m containers where n > m, at least one container will have more than one item. This seems obvious but its what you are running into with trying to map rand into another range.
If you really only want a range of [0,2]
, there are appropriate calls with that can give you the uniform distribution that you are after.
The following ode (from video linked below) uses the Mersenne twister to produce a uniform distribution:
#include <iostream>
#include <random>
int main() {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
std::cout << std::endl;
}
Note that this is using a deterministic engine with a random seed. It is not cryptographically secure, but it does have high quality fast random data. The key is that you are using the distribution correctly. You could use random_device for all your data, but thats consuming entropy from your system that you probably don't need to spend there (running out of entropy on ServerFault). The twister is good enough for random load balancing.
A really good watch on this is the video rand() Considered Harmful which addresses lots of things about rand. I've borrowed much of this section from that video.
Particular points from the video:
- Non-uniform distribution (3:56 .. 5:50 )
- (src * 1.0 / RAND_MAX) * 99 is hilariously non-uniform (6:12 .. 7:29)
- 2.0000001 problems (10:40)
- Hello, Random World (12:58 .. 15:30)
If you really want things to be perfectly balanced ((333, 333, 334) events for each of three choices 1000 times), this isn't random. Use a round robin instead.