# A random selection algorithm

Suppose you want to select k things randomly from a list of n items, with no duplicates. Of course, if k > n, then it's hopeless, and if k = n it's trivial, so the interesting case is when k < n. A naive approach might pick an item randomly, and try again if it's a duplicate.

`` ``

var selected = new List(); while( len(selected) < k ) { var newIndex = floor(random() * n); if ( items[newIndex] not in selected ) { selected.append( items[newIndex] ); } }

This may work fine if k is much smaller than n, but when k is large and close to n, this can be disastrously slow. Near the end, most of the items are already selected, so we keep picking items and having to "throw them back". The alternative of selecting which items to omit (instead of selecting the ones to keep) fails in the same way for small values of k.

Instead, there's a simple trick that is guaranteed to never require more than n random numbers. As we get to each item, we select it or not with the "correct" probability. Here is the pseudocode:

`` ``

var selected = new List(); var needed = k; var available = n; var position = 0; while( len(selected) < k ) { if( random() < needed / available ) { selected.append( items[position] ) needed -= 1; } available -= 1; }

Handy trick... I've needed this algorithm quite a few times.

Posted Fri 27 June 2008 by mcherm in Math, Programming