Challenge: Keep track of the median of a stream of integers

• (2 Pages)
• 1
• 2

28 Replies - 5648 Views - Last Post: 06 August 2015 - 12:37 PM

#1 ishkabible

• spelling expret

Reputation: 1747
• Posts: 5,898
• Joined: 03-August 09

Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 07:32 PM

Ok so I saw this today and it had me stumped for quite a while. The problem is to keep track of the median of the so far examined integers in a stream of integers. For simplicity's sake you can assume the stream of integers come in pairs and that you are told the first median (this avoids the even numbered case).

The *real* challenge is how to do this efficiently. So here are some benchmarks that you can try to hit (where 'n' is the number of integers queried from the stream so far). It is also interesting to just come up with all the algorithms that meet these various space and time requirements.

1. O(n lg n) time and constant space
2. O(n) time and space
3. O(lg n) time and O(n) space
4. O(1) amortized time and O(n) space
5. O(1) time and O(1) space

Remember to post solutions in spoiler tags. Numbers 1 and 2 should be easy to get. The solution I saw today will naively meet 3. It can be easily sped up to 4. I'm pretty sure there is an easy way to do this in constant space in exact time but I haven't worked though all the details yet.

Is This A Good Question/Topic? 3

Replies To: Challenge: Keep track of the median of a stream of integers

#2 macosxnerd101

• Games, Graphs, and Auctions

Reputation: 12691
• Posts: 45,880
• Joined: 27-December 08

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 07:41 PM

Could you provide some sample inputs for everyone? Just so we're clear on the format of the stream.

#3 ishkabible

• spelling expret

Reputation: 1747
• Posts: 5,898
• Joined: 03-August 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 08:43 PM

Ah right! Streams are sequences of input that just keep on going. For instance the digits of pi form a stream of integers. An open socket being used in the IRC protocol is an example of a stream of characters. Most file IO is thought about as a stream of characters. You can also view Events as being a stream of event objects! For instance say you are waiting on key presses from the user. Conceptually speaking this is no different than an arbitrary stream of characters.

If you are familair with Haskell then infinite lists are streams. If you are familiar with Python then generators fit the bill. If you are familiar with C# then IEnumerators constructed using the yield keyword form streams. in C++ and many other languages as mentioned the file IO classes (in C++ literally called stream) form streams of characters. In the case of C++ if you just keep reading integers from the stream then they are streams of integers (so C++ streams are sorta like heterogeneous streams really).

Additionally one can view functions from integers to objects as being streams. If you plug in 0 you get the first element, plug in 1 you get the second, etc... These sorts of streams can have some really strange run time properties.

Here is how I would set things up in the following languages. You should be able to find something that fits the bill for your language of choice here. All the following code is untested and incomplete but should give you the right idea.

Python:
```def pi_digits():
while True:
digit = #compute digit of pi or some such other thing
yield digit

#now build a function that efficiently yields the median digit at any given time
def medians(digits):
#store some needed state here
for digit in digits:
#update to find the new median thus far
yield median

for med in medians(pi_digits):
print(med)

```

Lua, Ruby, .NET (see below), and many other dynamic languages all have a feature like this. So this should generalize to many langauges.

C#:
```public class Median
{
public static IEnumerable PiDigits()
{
//setup some state here
while(true) {
//compute some new stuff and update the state
yield digit;
}
}

public static IEnumerable Medians(IEnumerable digits)
{
//setup some state here
foreach (int number in digits)
{
//compute the new median
yield median;
}
}

public static void Main()
{
foreach(int med in Medians(PiDigits())) {
Console.out.println(med);
}
}
}

```

This should work for any .NET language as stated above.

Haskell (the easiest language to do this in):
```digits :: [Int]
digits = --some infinite list

--here the "..." is for some accumulator elements need to maintain the state
--you could also use the state monad if you were feeling extra monadic today
--you could also get creative and do this with folds/scans/maps
medians :: ... -> [Int] -> [Int]
medians ... digits = --compute the stream of medians the given inputs

--here the "..." is the the initial accumulators
main = do
mapM_ print (medians ... digits)

```

This directly generalizes for any lazy language. If you are using a strict functional language and it doesn't have lazy lists ready to go you can make a type called 'Stream' which has a single element called 'head' and another called 'tail' which is a function from unit to the stream type.

C++ (this should work for most imperative languages like Java):
```int main() {
while(true) {
int digit;
cin >> digit;
//do some computation
cout << median;
}
}

```

So basically just reading from the console and outputting to the console for a possibly indefinite amount of time. You should be able to do this in any language worth it's salt.

If you don't know the best way to implement streams in your language of choice just ask and I'll see if I can give you a decent solution to work with. Streams are really neat things from a theoretical point of view and a god sent to real programmers. Learn to use them!!

This post has been edited by ishkabible: 26 July 2015 - 08:53 PM

#4 macosxnerd101

• Games, Graphs, and Auctions

Reputation: 12691
• Posts: 45,880
• Joined: 27-December 08

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 08:45 PM

I was actually seeking clarification on this:

Quote

For simplicity's sake you can assume the stream of integers come in pairs and that you are told the first median (this avoids the even numbered case).

And was hoping for some textual representation (ie., an example) of the input.

#5 ishkabible

• spelling expret

Reputation: 1747
• Posts: 5,898
• Joined: 03-August 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 09:02 PM

I just mean that you don't have to worry about the case where there are an even number of items in the stream (you are supposed to average the middle two in that case or something). To avoid this complication just read in two numbers at a time after having already read in the first number. That way you only ever look at the odd numbered cases.

so say my input is the digits of pi 31415926535897932384626433832795028841971693993751...

First I would read in 3 and the output would be 3
Next I would read in 1 and 4 and the output would be 3
Next I would read in 1 and 5 and the output would be 3 (again)
Next I would read in 2 and 9 and the output would still be 3
etc...

Does that clarify things? I can't actually write the whole infinite input out but the idea is clear, right?

This post has been edited by ishkabible: 26 July 2015 - 09:06 PM

#6 macosxnerd101

• Games, Graphs, and Auctions

Reputation: 12691
• Posts: 45,880
• Joined: 27-December 08

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 09:04 PM

Yes. That makes a lot more sense. Thank you for clarifying!

#7 macosxnerd101

• Games, Graphs, and Auctions

Reputation: 12691
• Posts: 45,880
• Joined: 27-December 08

Re: Challenge: Keep track of the median of a stream of integers

Posted 26 July 2015 - 11:36 PM

My solution:
Spoiler

#8 ishkabible

• spelling expret

Reputation: 1747
• Posts: 5,898
• Joined: 03-August 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 12:45 AM

That should be optimal. When I say O(1) time I mean per integer which is O(n) overall. You can't do better than O(1) per digit (at least in this case).

Spoiler

This post has been edited by ishkabible: 27 July 2015 - 01:13 AM

#9 cfoley

• Cabbage

Reputation: 2397
• Posts: 5,030
• Joined: 11-December 07

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 07:39 AM

Can the integers be greater than 9? Below zero?

#10 kingfeanor

Reputation: 47
• Posts: 62
• Joined: 18-April 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 09:04 AM

macosxnerd101, on 27 July 2015 - 12:36 AM, said:

My solution:
Spoiler

Unless I am missing something, this doesn't work.
Let us consider the sequence of 0, 1, 2, 3, 4, 5, 6, ...
Med [0, 1, 2]
Receive (3, 4) - > There is no element i (3 or 4) that satisfies Med[0] < i < Med[2]. (yet the Median is now 2).

#11 macosxnerd101

• Games, Graphs, and Auctions

Reputation: 12691
• Posts: 45,880
• Joined: 27-December 08

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 09:06 AM

You're absolutely correct. I shouldn't be doing these things at 2:30 AM.

#12 kingfeanor

Reputation: 47
• Posts: 62
• Joined: 18-April 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 11:21 AM

I tried to solve this problem in a previous job without storing the numbers. The only way that is possible is with statistical approximation: http://info.prelert....ion-of-integers

A good overview of the other ways to solve this can be found here: http://www.ardendert...integer-stream/

#13 mojo666

Reputation: 409
• Posts: 885
• Joined: 27-June 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 11:56 AM

Quote

O(1) time and O(1) space
.
.
.
I'm pretty sure there is an easy way to do this in constant space in exact time but I haven't worked though all the details yet.

This is only possible if by "Integer" you mean "An integer within a given range or other finite set of values". If we are reading integers of any possible value, then every integer we have already read can become the median in the future, so every integer needs to be stored.

For cases where we do have a range, we can implement a counting algorithm as outlined below.

Spoiler

This post has been edited by mojo666: 27 July 2015 - 12:29 PM

#14 ishkabible

• spelling expret

Reputation: 1747
• Posts: 5,898
• Joined: 03-August 09

Re: Challenge: Keep track of the median of a stream of integers

Posted 27 July 2015 - 12:23 PM

macosxnerd101, on 27 July 2015 - 05:06 PM, said:

You're absolutely correct. I shouldn't be doing these things at 2:30 AM. />/>/>/>

So apparently when implementing your idea I accidentally fixed (at least) this mistake because I misinterpreted you. The implementation I provided however still doesn't wind up working, on the same input not less, but it has to read in 8 digits before it fails rather than 3. It really seemed like there should be an O(1) space and time solution to this! I just can't seem to figure it out. The more and more I think about it however, the more and more I disagree that this should have an O(1) time and space solution. I can probably prove you can't do this in less than O(n) space by constructing tricky sequences (see above, already done). I'm still hopeful that for random inputs the expected amortized running time can be O(1) but that is it.

Quote

Can the integers be greater than 9? Below zero?

Yes this is for arbitrary streams; the digits of pi was just an example. However I encourage people to think about how it makes this problem very easily solvable if the digits are bounded by a small number!

solution:
Spoiler

edit: this is the same solution as above

Also one last thing: I said that you can push this all the way down to linear space an constant amortized time. I made a mistake when I asserted that. The implementation I was thinking of still requires an amortized O(lg n) operation every once and a while. In worst case this operation actually happens every time.

here is the solution:
Spoiler

This post has been edited by ishkabible: 27 July 2015 - 12:26 PM

#15 dday9

• D.I.C Regular

Reputation: 95
• Posts: 495
• Joined: 17-April 13

Re: Challenge: Keep track of the median of a stream of integers

Posted 30 July 2015 - 12:17 PM

This interested me and so I figured that I would give it a try:
Spoiler

This post has been edited by dday9: 30 July 2015 - 12:17 PM