## A Random Pick In O(n)?

A Random Pick In O(n)?

As far as I can attain the following implementation of Random Pick is of O(n).
```  <HideModuleName>
Public Module Exts

Private Class RandCMP(Of T)
Implements IComparer(Of T)
Public Function Compare(x As T, y As T) As Integer Implements IComparer(Of T).Compare
Return RNG.Next.CompareTo(RNG.Next)
End Function
End Class

<Extension>
Public  Function RandomPick(Of T)(source As IEnumerable(Of T), Pick As Integer) As IEnumerable(of T)
If (Pick<0) OrElse (Pick > source.Count) THen Return  Enumerable.Empty(Of T)
Return (New SortedSet(Of T)( source , New RandCMP(Of T))).Take(Pick).AsEnumerable
End Function

End Module

```

Is my analysis correct?

### 7 Comments On This Entry

Page 1 of 1

#### dawmail333

14 October 2013 - 12:36 AM
I don't see the point of sorting. I haven't written VB.Net for quite a few years now, so here's my version.

```Imports System.Runtime.CompilerServices

<HideModuleName>
Module Extensions

<Extension>
Public Function RandomPick(Of T)(source As IList(Of T), numPicked As Integer) As List(Of T)
If numPicked <= 0 Then Return New List(Of T)
If numPicked > source.Count Then numPicked = source.Count

Dim sourceList As New List(Of T)(source)
Dim values As New List(Of T)

Dim RNG As New Random()

Do Until numPicked = 0
Dim val As Integer = RNG.Next(0, sourceList.Count)

sourceList.RemoveAt(val)

numPicked -= 1
Loop

Return values
End Function

End Module
```

Reasons I think this is a better candidate for O(n):

• I have no idea how good LINQ is with .Take(n) - it could be O(1), or it could be O(n). The point of big O is to take the worst case scenario, in which case your example is O(2n)
• Your version of n is the length of the collection. My version of n is the number of picks taken.

I tested this version on an array, it randomly selected from all elements of that array, and by cloning the array, no side effects are incurred.

(P.S. You didn't declare RNG, and when the number of picks is greater than the collection size, returning an empty collection is illogical - either throw an ArgumentOutOfRangeException, or (as I did), assume the largest possible value (a habit I've developed from other languages))
1

14 October 2013 - 11:38 AM
There are hidden costs in your implementation.
New List( ) is O(n)
.Add and .RemoveAt are both O(n)

But over the full iteration of a taking number of items in source.

Overall it's ends up be O( 3(t log t) + n)

Also your function isn't lazily evaluated.

Mine is O(n+t) do the fact Sets of O(1) for inserting items.
I'm using SortedSet so I can tell it how to do the comparison of items.
0

#### cfoley

15 October 2013 - 05:31 AM
Your shuffling algorithm doesn't really do the job. I don't know what the underlying structure of the sortedSet is but using a comparer to randomly place items will probably bias the distribution.

You can illustrate the concept with a list implementation. The first item, goes in and you have:

[1]

The next item gives you [1, 2] or [2, 1] in equal probability.

The next item goes in and the comparer attempts to put it at index 0 with a 50% chace of success. Then position 1 with a 50% chance of success (overall chance of position2 is 50% * 50% = 25%). Finally it has a 25% chance of being placed at position 2.

Imagine the last element in a very long list.
Index 0: 50%
Index 1: 25%
Index 3: 12.5%
Index 4: 6.25%
Index 5: 3.125%

Index 100: Very low chance indeed!

Now, the data structure for sorted SortedSet will probably be a tree or a heap depending on the common usage but looking at the simple list data structure gives an idea of how a bias can creep in.

Let's test the idea:

```    Dim RNG As New Random()

Sub Main()
Dim source As New List(Of Integer)
For n = 0 To 1000000
Next
For n = 1 To 10
Dim result = RandomPick(source, 10)
Debug.Print(String.Join(", ", result))
Next
End Sub

```

Sometimes I run this and all 10 sets look reasonable but sometimes I run it and 2 or 3 sets are composed of numbers near 1 and numbers near 1000000. I could do more runs and use a statistical test but I think the pattern is obvious enough to be sure it's not random.

```924848, 923106, 191145, 545869, 221432, 495037, 201539, 794436, 37309, 438397
1000000, 1, 2, 3, 4, 5, 6, 7, 8, 9
108639, 34472, 336537, 792098, 240594, 949221, 614849, 299355, 243134, 114845
303798, 450206, 834564, 917867, 61258, 630534, 307223, 439982, 246140, 451940
1331, 659883, 426183, 549050, 318910, 670006, 40608, 409918, 271241, 761878
560864, 445593, 325914, 210075, 818675, 914185, 107689, 281861, 370348, 381063
1000000, 999999, 2, 3, 999998, 5, 999997, 999996, 8, 999994
859953, 547339, 839754, 182165, 401429, 795923, 81587, 237846, 456337, 765225
860196, 837606, 804655, 988661, 116865, 312786, 839484, 738181, 232249, 49715
85345, 674154, 499925, 240076, 273110, 758087, 20436, 931743, 572934, 250471
```
0

#### cfoley

15 October 2013 - 05:33 AM
I made an error in the indexes for my example. They should go 0, 1, 2, 3, 4. DIC won't let me edit the comment.
0

#### cfoley

15 October 2013 - 06:18 AM
Here is my attempt at the random pick. It is similar to dawmail333's except I attempt to get around the O(n) removal by swapping with the last list item then removing.

On testing, however, I didn't notice a difference in runtime on choosing 10 from sets of 1000000. Both completed very quickly. In choosing 100 from sets of 1000000, mine is only slightly faster. I think the simplicity in dawmail333's solution makes it better than mine for most use cases. I was solving a performance problem that doesn't really exist.

Adam, your solution is significantly slower. I think it's populating the sorted set that kills it.

```Public Class MyRandomPicker(Of T)

Private remaining As List(Of T)
Private Shared rand As New Random

Public Sub New(ByVal source As IEnumerable(Of T))
remaining = New List(Of T)(source)
End Sub

Public Function nextItem() As T
Dim index = rand.Next(remaining.Count)
Dim result = remaining(index)
remove(index)
Return result
End Function

Private Sub remove(ByVal index As Integer)
Dim maxIndex = remaining.Count - 1
remaining(index) = remaining(maxIndex)
remaining.RemoveAt(maxIndex)
End Sub

Public Shared Function randomPick(ByVal source As IEnumerable(Of T), ByVal count As Integer) As IEnumerable(Of T)
Dim result As New List(Of T)
Dim picker As New MyRandomPicker(Of T)(source)
For n = 1 To count
Next
Return result
End Function

End Class

```
0

15 October 2013 - 04:19 PM
The bias will still appear in yours cfoley as System.Random's distribution isn't perfect.
0

#### cfoley

15 October 2013 - 11:09 PM
The pseudorandomness won't cause the same massive bias. The sortedset and comparer method won't shuffle fairly even with a fair random number generator.
0
Page 1 of 1

### 1 user(s) viewing

1 Guests
0 member(s)
0 anonymous member(s)

S M T W T F S
123 4 567
891011121314
15161718192021
22232425262728
293031