A Random Pick In O(n)?
As far as I can attain the following implementation of Random Pick is of O(n).
Is my analysis correct?
As far as I can attain the following implementation of Random Pick is of O(n).
<HideModuleName> Public Module Exts Private Class RandCMP(Of T) Implements IComparer(Of T) Public Function Compare(x As T, y As T) As Integer Implements IComparer(Of T).Compare Return RNG.Next.CompareTo(RNG.Next) End Function End Class <Extension> Public Function RandomPick(Of T)(source As IEnumerable(Of T), Pick As Integer) As IEnumerable(of T) If (Pick<0) OrElse (Pick > source.Count) THen Return Enumerable.Empty(Of T) Return (New SortedSet(Of T)( source , New RandCMP(Of T))).Take(Pick).AsEnumerable End Function End Module
Is my analysis correct?
7 Comments On This Entry
Page 1 of 1
dawmail333
14 October 2013  12:36 AM
I don't see the point of sorting. I haven't written VB.Net for quite a few years now, so here's my version.
Reasons I think this is a better candidate for O(n):
I tested this version on an array, it randomly selected from all elements of that array, and by cloning the array, no side effects are incurred.
(P.S. You didn't declare RNG, and when the number of picks is greater than the collection size, returning an empty collection is illogical  either throw an ArgumentOutOfRangeException, or (as I did), assume the largest possible value (a habit I've developed from other languages))
Imports System.Runtime.CompilerServices <HideModuleName> Module Extensions <Extension> Public Function RandomPick(Of T)(source As IList(Of T), numPicked As Integer) As List(Of T) If numPicked <= 0 Then Return New List(Of T) If numPicked > source.Count Then numPicked = source.Count Dim sourceList As New List(Of T)(source) Dim values As New List(Of T) Dim RNG As New Random() Do Until numPicked = 0 Dim val As Integer = RNG.Next(0, sourceList.Count) values.Add(sourceList.Item(val)) sourceList.RemoveAt(val) numPicked = 1 Loop Return values End Function End Module
Reasons I think this is a better candidate for O(n):
 I have no idea how good LINQ is with .Take(n)  it could be O(1), or it could be O(n). The point of big O is to take the worst case scenario, in which case your example is O(2n)
 Your version of n is the length of the collection. My version of n is the number of picks taken.
I tested this version on an array, it randomly selected from all elements of that array, and by cloning the array, no side effects are incurred.
(P.S. You didn't declare RNG, and when the number of picks is greater than the collection size, returning an empty collection is illogical  either throw an ArgumentOutOfRangeException, or (as I did), assume the largest possible value (a habit I've developed from other languages))
cfoley
15 October 2013  05:31 AM
Your shuffling algorithm doesn't really do the job. I don't know what the underlying structure of the sortedSet is but using a comparer to randomly place items will probably bias the distribution.
You can illustrate the concept with a list implementation. The first item, goes in and you have:
[1]
The next item gives you [1, 2] or [2, 1] in equal probability.
The next item goes in and the comparer attempts to put it at index 0 with a 50% chace of success. Then position 1 with a 50% chance of success (overall chance of position2 is 50% * 50% = 25%). Finally it has a 25% chance of being placed at position 2.
Imagine the last element in a very long list.
Index 0: 50%
Index 1: 25%
Index 3: 12.5%
Index 4: 6.25%
Index 5: 3.125%
Index 100: Very low chance indeed!
Now, the data structure for sorted SortedSet will probably be a tree or a heap depending on the common usage but looking at the simple list data structure gives an idea of how a bias can creep in.
Let's test the idea:
Sometimes I run this and all 10 sets look reasonable but sometimes I run it and 2 or 3 sets are composed of numbers near 1 and numbers near 1000000. I could do more runs and use a statistical test but I think the pattern is obvious enough to be sure it's not random.
You can illustrate the concept with a list implementation. The first item, goes in and you have:
[1]
The next item gives you [1, 2] or [2, 1] in equal probability.
The next item goes in and the comparer attempts to put it at index 0 with a 50% chace of success. Then position 1 with a 50% chance of success (overall chance of position2 is 50% * 50% = 25%). Finally it has a 25% chance of being placed at position 2.
Imagine the last element in a very long list.
Index 0: 50%
Index 1: 25%
Index 3: 12.5%
Index 4: 6.25%
Index 5: 3.125%
Index 100: Very low chance indeed!
Now, the data structure for sorted SortedSet will probably be a tree or a heap depending on the common usage but looking at the simple list data structure gives an idea of how a bias can creep in.
Let's test the idea:
Dim RNG As New Random() Sub Main() Dim source As New List(Of Integer) For n = 0 To 1000000 source.Add(n) Next For n = 1 To 10 Dim result = RandomPick(source, 10) Debug.Print(String.Join(", ", result)) Next End Sub
Sometimes I run this and all 10 sets look reasonable but sometimes I run it and 2 or 3 sets are composed of numbers near 1 and numbers near 1000000. I could do more runs and use a statistical test but I think the pattern is obvious enough to be sure it's not random.
924848, 923106, 191145, 545869, 221432, 495037, 201539, 794436, 37309, 438397 1000000, 1, 2, 3, 4, 5, 6, 7, 8, 9 108639, 34472, 336537, 792098, 240594, 949221, 614849, 299355, 243134, 114845 303798, 450206, 834564, 917867, 61258, 630534, 307223, 439982, 246140, 451940 1331, 659883, 426183, 549050, 318910, 670006, 40608, 409918, 271241, 761878 560864, 445593, 325914, 210075, 818675, 914185, 107689, 281861, 370348, 381063 1000000, 999999, 2, 3, 999998, 5, 999997, 999996, 8, 999994 859953, 547339, 839754, 182165, 401429, 795923, 81587, 237846, 456337, 765225 860196, 837606, 804655, 988661, 116865, 312786, 839484, 738181, 232249, 49715 85345, 674154, 499925, 240076, 273110, 758087, 20436, 931743, 572934, 250471
cfoley
15 October 2013  05:33 AM
I made an error in the indexes for my example. They should go 0, 1, 2, 3, 4. DIC won't let me edit the comment.
cfoley
15 October 2013  06:18 AM
Here is my attempt at the random pick. It is similar to dawmail333's except I attempt to get around the O(n) removal by swapping with the last list item then removing.
On testing, however, I didn't notice a difference in runtime on choosing 10 from sets of 1000000. Both completed very quickly. In choosing 100 from sets of 1000000, mine is only slightly faster. I think the simplicity in dawmail333's solution makes it better than mine for most use cases. I was solving a performance problem that doesn't really exist.
Adam, your solution is significantly slower. I think it's populating the sorted set that kills it.
On testing, however, I didn't notice a difference in runtime on choosing 10 from sets of 1000000. Both completed very quickly. In choosing 100 from sets of 1000000, mine is only slightly faster. I think the simplicity in dawmail333's solution makes it better than mine for most use cases. I was solving a performance problem that doesn't really exist.
Adam, your solution is significantly slower. I think it's populating the sorted set that kills it.
Public Class MyRandomPicker(Of T) Private remaining As List(Of T) Private Shared rand As New Random Public Sub New(ByVal source As IEnumerable(Of T)) remaining = New List(Of T)(source) End Sub Public Function nextItem() As T Dim index = rand.Next(remaining.Count) Dim result = remaining(index) remove(index) Return result End Function Private Sub remove(ByVal index As Integer) Dim maxIndex = remaining.Count  1 remaining(index) = remaining(maxIndex) remaining.RemoveAt(maxIndex) End Sub Public Shared Function randomPick(ByVal source As IEnumerable(Of T), ByVal count As Integer) As IEnumerable(Of T) Dim result As New List(Of T) Dim picker As New MyRandomPicker(Of T)(source) For n = 1 To count result.Add(picker.nextItem) Next Return result End Function End Class
cfoley
15 October 2013  11:09 PM
The pseudorandomness won't cause the same massive bias. The sortedset and comparer method won't shuffle fairly even with a fair random number generator.
Page 1 of 1
