Subscribe to The Madman Scribblings        RSS Feed
-----

A Random Pick In O(n)?

Icon 7 Comments
A Random Pick In O(n)?

As far as I can attain the following implementation of Random Pick is of O(n).
  <HideModuleName>
  Public Module Exts

  Private Class RandCMP(Of T)
    Implements IComparer(Of T)
    Public Function Compare(x As T, y As T) As Integer Implements IComparer(Of T).Compare
       Return RNG.Next.CompareTo(RNG.Next)
    End Function
  End Class

    <Extension>
  Public  Function RandomPick(Of T)(source As IEnumerable(Of T), Pick As Integer) As IEnumerable(of T)
    If (Pick<0) OrElse (Pick > source.Count) THen Return  Enumerable.Empty(Of T)
    Return (New SortedSet(Of T)( source , New RandCMP(Of T))).Take(Pick).AsEnumerable
  End Function 

End Module



Is my analysis correct?

7 Comments On This Entry

Page 1 of 1

dawmail333 Icon

14 October 2013 - 12:36 AM
I don't see the point of sorting. I haven't written VB.Net for quite a few years now, so here's my version.

Imports System.Runtime.CompilerServices

<HideModuleName>
Module Extensions

    <Extension>
    Public Function RandomPick(Of T)(source As IList(Of T), numPicked As Integer) As List(Of T)
        If numPicked <= 0 Then Return New List(Of T)
        If numPicked > source.Count Then numPicked = source.Count

        Dim sourceList As New List(Of T)(source)
        Dim values As New List(Of T)

        Dim RNG As New Random()

        Do Until numPicked = 0
            Dim val As Integer = RNG.Next(0, sourceList.Count)

            values.Add(sourceList.Item(val))
            sourceList.RemoveAt(val)

            numPicked -= 1
        Loop

        Return values
    End Function

End Module


Reasons I think this is a better candidate for O(n):

  • I have no idea how good LINQ is with .Take(n) - it could be O(1), or it could be O(n). The point of big O is to take the worst case scenario, in which case your example is O(2n)
  • Your version of n is the length of the collection. My version of n is the number of picks taken.


I tested this version on an array, it randomly selected from all elements of that array, and by cloning the array, no side effects are incurred.

(P.S. You didn't declare RNG, and when the number of picks is greater than the collection size, returning an empty collection is illogical - either throw an ArgumentOutOfRangeException, or (as I did), assume the largest possible value (a habit I've developed from other languages))
1

AdamSpeight2008 Icon

14 October 2013 - 11:38 AM
There are hidden costs in your implementation.
New List( ) is O(n)
.Add and .RemoveAt are both O(n)

But over the full iteration of a taking number of items in source.

Overall it's ends up be O( 3(t log t) + n)

Also your function isn't lazily evaluated.


Mine is O(n+t) do the fact Sets of O(1) for inserting items.
I'm using SortedSet so I can tell it how to do the comparison of items.
0

cfoley Icon

15 October 2013 - 05:31 AM
Your shuffling algorithm doesn't really do the job. I don't know what the underlying structure of the sortedSet is but using a comparer to randomly place items will probably bias the distribution.

You can illustrate the concept with a list implementation. The first item, goes in and you have:

[1]

The next item gives you [1, 2] or [2, 1] in equal probability.

The next item goes in and the comparer attempts to put it at index 0 with a 50% chace of success. Then position 1 with a 50% chance of success (overall chance of position2 is 50% * 50% = 25%). Finally it has a 25% chance of being placed at position 2.

Imagine the last element in a very long list.
Index 0: 50%
Index 1: 25%
Index 3: 12.5%
Index 4: 6.25%
Index 5: 3.125%

Index 100: Very low chance indeed!

Now, the data structure for sorted SortedSet will probably be a tree or a heap depending on the common usage but looking at the simple list data structure gives an idea of how a bias can creep in.

Let's test the idea:

    Dim RNG As New Random()

    Sub Main()
        Dim source As New List(Of Integer)
        For n = 0 To 1000000
            source.Add(n)
        Next
        For n = 1 To 10
            Dim result = RandomPick(source, 10)
            Debug.Print(String.Join(", ", result))
        Next
    End Sub



Sometimes I run this and all 10 sets look reasonable but sometimes I run it and 2 or 3 sets are composed of numbers near 1 and numbers near 1000000. I could do more runs and use a statistical test but I think the pattern is obvious enough to be sure it's not random.

924848, 923106, 191145, 545869, 221432, 495037, 201539, 794436, 37309, 438397
1000000, 1, 2, 3, 4, 5, 6, 7, 8, 9
108639, 34472, 336537, 792098, 240594, 949221, 614849, 299355, 243134, 114845
303798, 450206, 834564, 917867, 61258, 630534, 307223, 439982, 246140, 451940
1331, 659883, 426183, 549050, 318910, 670006, 40608, 409918, 271241, 761878
560864, 445593, 325914, 210075, 818675, 914185, 107689, 281861, 370348, 381063
1000000, 999999, 2, 3, 999998, 5, 999997, 999996, 8, 999994
859953, 547339, 839754, 182165, 401429, 795923, 81587, 237846, 456337, 765225
860196, 837606, 804655, 988661, 116865, 312786, 839484, 738181, 232249, 49715
85345, 674154, 499925, 240076, 273110, 758087, 20436, 931743, 572934, 250471
0

cfoley Icon

15 October 2013 - 05:33 AM
I made an error in the indexes for my example. They should go 0, 1, 2, 3, 4. DIC won't let me edit the comment.
0

cfoley Icon

15 October 2013 - 06:18 AM
Here is my attempt at the random pick. It is similar to dawmail333's except I attempt to get around the O(n) removal by swapping with the last list item then removing.

On testing, however, I didn't notice a difference in runtime on choosing 10 from sets of 1000000. Both completed very quickly. In choosing 100 from sets of 1000000, mine is only slightly faster. I think the simplicity in dawmail333's solution makes it better than mine for most use cases. I was solving a performance problem that doesn't really exist.

Adam, your solution is significantly slower. I think it's populating the sorted set that kills it.

Public Class MyRandomPicker(Of T)

    Private remaining As List(Of T)
    Private Shared rand As New Random

    Public Sub New(ByVal source As IEnumerable(Of T))
        remaining = New List(Of T)(source)
    End Sub

    Public Function nextItem() As T
        Dim index = rand.Next(remaining.Count)
        Dim result = remaining(index)
        remove(index)
        Return result
    End Function

    Private Sub remove(ByVal index As Integer)
        Dim maxIndex = remaining.Count - 1
        remaining(index) = remaining(maxIndex)
        remaining.RemoveAt(maxIndex)
    End Sub

    Public Shared Function randomPick(ByVal source As IEnumerable(Of T), ByVal count As Integer) As IEnumerable(Of T)
        Dim result As New List(Of T)
        Dim picker As New MyRandomPicker(Of T)(source)
        For n = 1 To count
            result.Add(picker.nextItem)
        Next
        Return result
    End Function

End Class

0

AdamSpeight2008 Icon

15 October 2013 - 04:19 PM
The bias will still appear in yours cfoley as System.Random's distribution isn't perfect.
0

cfoley Icon

15 October 2013 - 11:09 PM
The pseudorandomness won't cause the same massive bias. The sortedset and comparer method won't shuffle fairly even with a fair random number generator.
0
Page 1 of 1