rand() gives same numbers again for a small range

Question

I'm trying to make sort of a game where I have a a grid of 20x20 and I display a player (P), a target (T) and three enemies (X). All these have an X and a Y coordinate which are assigned using rand(). The problem is that if I try to get more points in the game (refills for energy etc) they overlap with one or more of the other points because the range is small(1 to 20 inclusive).

These are my variables and how I'm assigning them values: (the COORD is a struct with just an X and a Y)

const int gridSize = 20;
COORD player;
COORD target;
COORD enemy1;
COORD enemy2;
COORD enemy3;

//generate player
srand ( time ( NULL ) );
spawn(&player);
//generate target
spawn(&target);
//generate enemies
spawn(&enemy1);
spawn(&enemy2);
spawn(&enemy3);

void spawn(COORD *point)
{
    //allot X and Y coordinate to a point
    point->X = randNum();
    point->Y = randNum();
}

int randNum()
{
    //generate a random number between 1 and gridSize
    return (rand() % gridSize) + 1;
}

I want to add more things to the game but the probability of overlap increases when I do that. Is there any way to fix this?

`rand()` is a pityfull RNG, and anyway with such a small range, you don't just have to *expect* collisions, they are nearly guaranteed. — Deduplicator, Jul 19 '15 at 23:40
Read these: [Boost random number generator](http://stackoverflow.com/q/2254909/439793) and [Boost.Random](http://www.boost.org/doc/libs/1_58_0/doc/html/boost_random.html). — , Jul 19 '15 at 23:43
While it is true that `rand()` is a lousy RNG, it is probably appropriate for a single player game, and RNG quality isn't the issue here. — Gort the Robot, Jul 20 '15 at 02:52
Speaking about the quality of `rand()` seems to be irrelevant here. There is no cryptography involved, and any RNG will very probably give collisions in such a small map. — Tom Cornebize, Jul 20 '15 at 07:59
What you're seeing is known as the Birthday Problem. If your random numbers are being converted to a range smaller than the natural range of the PRNG then the probability of geting two instances of the same number is much higher than you might think. Some time ago I wrote a blurb on this subject on Stackoverflow [here.](http://stackoverflow.com/a/2145551/15401) — ConcernedOfTunbridgeWells, Jul 20 '15 at 13:38

score 40 · Accepted Answer · 2015-07-20T00:08:30.250

40

While the users who complain about rand() and recommend better RNGs are right about the quality of the random numbers, they are also missing the bigger picture. Duplicates in streams of random numbers cannot be avoided, they are a fact of life. This is the lesson of the birthday problem.

On a grid of 20 * 20 = 400 possible spawn positions, a duplicate spawn point is to be expected (50% probability) even when spawning only 24 entities. With 50 entities (still only 12.5% of the whole grid), the probability of a duplicate is over 95%. You have to deal with collisions.

Sometimes you can draw all samples at once, then you can use a shuffle algorithm to draw n guaranteed-distinct items. You just need to generate the list of all possibilities. If the full list of possibilities is too large to store, you can generate spawn positions one at a time as you do now (just with a better RNG) and simply re-generate when a collision occurs. Even though having some collisions is likely, many collisions in a row are exponentially unlikely even if most of the grid is populated.

edited Jul 20 '15 at 00:08

answered Jul 20 '15 at 00:03

I thought about respawning in case of a collision but if I have more items, as I intend to have, then the lookup for a collision would get complicated. I would also have to edit the checks in case of a point being added or removed from the game. I'm fairly inexperienced so if there is a workaround to this, I couldn't see it. – Rabeez Riaz Jul 20 '15 at 00:12
7

If you have a 20x20 checkerboard, as opposed to a 20x20 continuous (real) XY plane, then what you have is a 400-cell lookup table to check for collisions. This is TRIVIAL. – John R. Strohm Jul 20 '15 at 01:44
@RabeezRiaz If you have a larger map, you will have some grid-based data structure (a grid consisting of some area of cells, and every item inside that cell is stored in a list). If your map is yet larger, you will implement rect-tree. – rwong Jul 20 '15 at 02:09
2

@RabeezRiaz: if lookup is too complicated, use his first suggestion: generate a list all 400 possible starting locations, shuffle them so that they are in a random order (lookup the algorithm), and then start using locations from the front when you need to generate stuff (keep track of how many you've used already). No collisions. – RemcoGerlich Jul 20 '15 at 07:39
2

@RabeezRiaz No need to shuffle the entire list, if you only need a small number of random values, just shuffle the part you need (as in, take a random value from the list of 1..400, remove it, and repeat until you have enough elements). In fact that's how a shuffling algorithm works anyway. – Dorus Jul 20 '15 at 08:32
@Dorus You still need to generate and store the entire list though. In most cases where shuffling the entire list is prohibitive, that's already a dealbreaker. – Jul 20 '15 at 08:35

Lyise · Answer 2 · 2015-07-20T13:48:36.953

If you always want to avoid playing a new entity in a location that has already been allocated to something else, you can change around your process slightly. This would guarantee unique locations, but it requires a bit more overhead. Here are the steps:

Setup a collection of references to all possible locations on the map (for the 20x20 map, this would be 400 locations)
Pick a location at random from this collection of 400 (rand() would work fine for this)
Remove this possibility from the possible locations collection (so it now has 399 possibilities)
Repeat until all entities have a specified location

So long as you're removing the location from the set you are picking from, there should be no chance of a second entity receiving the same location (unless you're picking the locations from more than one thread at once).

A real world analogue to this would be drawing a card from a deck of cards. Currently, you're shuffling the deck, drawing a card and marking it down, putting the drawn card back into the deck, re-shuffling and drawing again. The above approach skips putting the card back into the deck.

score 1 · Answer 3 · edited Apr 12 '17 at 07:31

Pertaining to `rand() % n` being less than ideal

Doing rand() % n has a non-uniform distribution. You will get a disproportionate number of certain values because the number of values isn't a multiple of 20

Next, rand() is typically a linear congruential generator (there are many others, just this is the most likely one implemented - and with less than ideal parameters (there are many ways to select the parameters)). The biggest problem with this is that often the low bits in it (the ones you get with a % 20 type expression) are not that random. I recall one rand() from years ago where the lowest bit alternated from 1 to 0 with each call to rand() -- it wasn't very random.

From rand(3) man page:

The  versions of rand() and srand() in the Linux C Library use the same
random number generator as random() and srandom(), so  the  lower-order
bits  should  be as random as the higher-order bits.  However, on older
rand() implementations, and on  current implementations  on  different
systems,  the  lower-order  bits  are much less random than the higher-
order bits.  Do not use this function in applications  intended to  be
portable when good randomness is needed.

This may be now relegated to history, but it is quite possible that you've still got a poor rand() implementation hiding down somewhere in the stack. In which case, its still quite applicable.

The thing to do is to actually use a good random number library (that gives good random numbers) and then ask for random numbers within the range you want.

An example of a good random number bit of code (from 13:00 in the linked video)

#include <iostream>
#include <random>
int main() {
    std::mt19937 mt(1729); // yes, this is a fixed seed
    std::uniform_int_distribution<int> dist(0, 99);
    for (int i = 0; i < 10000; i++) {
        std::cout << dist(mt) << " ";
    }
    std::cout << std::endl;
}

Compare this to:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
    srand(time(NULL));
    for (int i = 0; i < 10000; i++) {
        printf("%d ", rand() % 100);
    }
    printf("\n");
}

Run both of these programs and compare how often certain numbers come up (or don't come up) in that output.

Related video: rand() considered harmful

Some historical aspects of rand() causing bugs in Nethack that one should watch and consider in one's own implementations:

Nethack RNG Problem

Rand() is a very fundamental function for Nethack's random number generation. The way Nethack uses it is buggy or it may be argued that lrand48() produces crappy pseudo-random numbers. (However, lrand48() is a library function using a defined PRNG method and any program that uses it should take into account the weaknesses of that method.)

The bug is that Nethack relies (sometimes exclusively as is the case in rn(2)) on the lower bits of the results from lrand48(). Because of this, RNG in the whole game works bad. This is especially noticable before user actions introduce further randomness, i.e. in character generation and first level creation.

While the above was from 2003, it should still be kept in mind as it may not be the case that all of the systems running your intended game will be an up to date Linux system with a good rand() function.

If you are just doing this for yourself, you can test how good your random number generator is by writing some code and testing the output with ent.

On the properties of random numbers

There are other interpretations of 'random' that aren't exactly random. In a random stream of data, it is quite possible to get the same number twice. If you flip a coin (random), it is quite possible to get two heads in a row. Or throw a dice twice and get the same number twice in a row. Or spinning a roulette wheel and getting the same number twice there.

The distribution of numbers

When playing a song list, people expect 'random' to mean that the same song or artist won't be played a second time in a row. Having a playlist play The Beatles twice in a row is considered 'not random' (though it is random). The perception that for a play list of four songs played a total of eight times:

1 3 2 4 1 2 4 3

is more 'random' than:

1 3 3 2 1 4 4 2

More on this for the 'shuffling' of songs: How to shuffle songs?

On repeated values

If you don't want to repeat values, there is a different approach that should be considered. Generate all the possible values, and shuffle them.

If you are calling rand() (or any other random number generator), you are calling it with replacement. You can always get the same number twice. One option is to toss out the values again and again until you select one that meets your requirements. I will point out that this has a non-determistic runtime and it is possible that you could find yourself in a situation where there is an infinite loop unless you start doing a more complex back tracing.

List and Pick

Another option is to generate a list of all possible valid states and then select a random element from that list. Find all the empty spots (that meet some rules) in the room and then pick a random one from that list. And then do it again and again until you're done.

Shuffle

The other approach is to shuffle as if it was a deck of cards. Start out with all the empty spots in the room and then start assigning them by dealing out the empty spots, one at a time, to each rule/process asking for an empty spot. You are done when you run out of cards or things stop asking for them.

`Next, rand() is typically a linear congruential generator` This isn't true on many platforms now. From the rand(3) man page of linux:" The versions of rand() and srand() in the Linux C Library use the same random number generator as random(3) and srandom(3), so the lower-order bits should be as random as the higher-order bits." Also, as @delnan points out, the quality of the PRNG isn't the real problem here. — Charles E. Grant, Jul 20 '15 at 00:09
I'm downvoting this because it doesn't solve the actual problem. — user253751, Jul 20 '15 at 02:52
@immibis Then neither does the other answer "solve" the actual problem and should be downvoted. I think the question isn't "fix my code", it's "why am I getting duplicate random numbers?" To the second question, I believe the question is answered. — Neil, Jul 20 '15 at 06:07
Even with the smallest value of `RAND_MAX` 32767 the difference is 1638 possible ways of getting some numbers vs 1639 for others. Seems unlikely to make much practical difference to the OP. — Martin Smith, Jul 20 '15 at 06:24
@CharlesE.Grant not everyone is running Linux. An example from Random.org [shows the 'random' distribution from Windows](https://www.random.org/analysis/randbitmap-wamp.png). Nethack famously had trouble with its implementation of `rn2()` which allowed a player familiar with the game to get every random number to their liking. — , Jul 20 '15 at 14:43
Over a decade ago, I wrote a weighted playlist generator. I didn't use `rand` but I got complaints from people who were sure I was because they'd get two plays in a row, or something similar. I ended up making it deliberately less random in certain ways so people perceived it as random. (I couldn't shuffle because I was generating "infinite" playlists.) — Gort the Robot, Jul 20 '15 at 15:28
@StevenBurnap you might enjoy dabbling in the question [I'd like to write an “ultimate shuffle” algorithm to sort my mp3 collection](http://programmers.stackexchange.com/q/194480/40980) - and yes, the perception of random and what is actually random are different things that sometimes makes understanding what is *really* being asked about being random difficult. — , Jul 20 '15 at 15:34

KwentRell · Answer 4 · 2015-07-22T06:26:55.427

The simplest solution to this problem has been quoted in previous answers: it is to make a list of random values alongside every one of your 400 cells, and then, to sort this random list. Your list of cells will be sorted as the random list, and this way, will be shuffled.

This method has the advantage of totally avoiding the overlapping of randomly selected cells.

The disadvantage is that you have to compute a random value on a separate list for each of your cells. So, you'd rather not doing it while the game has started.

Here is an example of how you can do it:

#include <algorithm>
#include <iostream>
#include <vector>

#define NUMBER_OF_SPAWNS 20
#define WIDTH 20
#define HEIGHT 20

typedef struct _COORD
{
  int x;
  int y;
  _COORD() : x(0), y(0) {}
  _COORD(int xp, int yp) : x(xp), y(yp) {}
} COORD;

typedef struct _spawnCOORD
{
  float rndValue;
  COORD*coord;
  _spawnCOORD() : rndValue(0.) {}
} spawnCOORD;

struct byRndValue {
  bool operator()(spawnCOORD const &a, spawnCOORD const &b) {
    return a.rndValue < b.rndValue;
  }
};

int main(int argc, char** argv)
{
  COORD map[WIDTH][HEIGHT];
  std::vector<spawnCOORD>       rndSpawns(WIDTH * HEIGHT);

  for (int x = 0; x < WIDTH; ++x)
    for (int y = 0; y < HEIGHT; ++y)
      {
        map[x][y].x = x;
        map[x][y].y = y;
        rndSpawns[x + y * WIDTH].coord = &(map[x][y]);
        rndSpawns[x + y * WIDTH].rndValue = rand();
      }

  std::sort(rndSpawns.begin(), rndSpawns.end(), byRndValue());

  for (int i = 0; i < NUMBER_OF_SPAWNS; ++i)
    std::cout << "Case selected for spawn : " << rndSpawns[i].coord->x << "x"
              << rndSpawns[i].coord->y << " (rnd=" << rndSpawns[i].rndValue << ")\n";
  return 0;
}

Result:

root@debian6:/home/eh/testa# ./exe 
Case selected for spawn : 11x15 (rnd=6.93951e+06)
Case selected for spawn : 14x1 (rnd=7.68493e+06)
Case selected for spawn : 8x12 (rnd=8.93699e+06)
Case selected for spawn : 18x13 (rnd=1.16148e+07)
Case selected for spawn : 1x0 (rnd=3.50052e+07)
Case selected for spawn : 2x17 (rnd=4.29992e+07)
Case selected for spawn : 9x14 (rnd=7.60658e+07)
Case selected for spawn : 3x11 (rnd=8.43539e+07)
Case selected for spawn : 12x7 (rnd=8.77554e+07)
Case selected for spawn : 19x0 (rnd=1.05576e+08)
Case selected for spawn : 19x14 (rnd=1.10613e+08)
Case selected for spawn : 8x2 (rnd=1.11538e+08)
Case selected for spawn : 7x2 (rnd=1.12806e+08)
Case selected for spawn : 19x15 (rnd=1.14724e+08)
Case selected for spawn : 8x9 (rnd=1.16088e+08)
Case selected for spawn : 2x19 (rnd=1.35497e+08)
Case selected for spawn : 2x16 (rnd=1.37807e+08)
Case selected for spawn : 2x8 (rnd=1.49798e+08)
Case selected for spawn : 7x16 (rnd=1.50123e+08)
Case selected for spawn : 8x11 (rnd=1.55325e+08)

Just change NUMBER_OF_SPAWNS to get more or less random cells, this will not change the computation time required for the task.

"and then, to sort all of them" - I believe you mean "shuffle" — , Jul 21 '15 at 16:41
I have completed my explanation a bit. It should be clearer now. — KwentRell, Jul 22 '15 at 06:28

rand() gives same numbers again for a small range

4 Answers4

Pertaining to rand() % n being less than ideal

On the properties of random numbers

The distribution of numbers

On repeated values

Pertaining to `rand() % n` being less than ideal