Subscribe to Stuck in an Infiniteloop        RSS Feed
-----

Maximum Common Range of k Intervals Given N Intervals

Icon 1 Comments
This thread (help vampiracy aside) had a very interesting problem:

Given N intervals [a,b] and a number k, find the maximum common range of any k intervals.

This sounds like a computer science problem!

Some solutions to this problem space involve Interval Trees and I will probably go down that rabbit hole at some point in the future, but upon reading the problem description I had the immediate need to try a dynamic programming solution. How can I build a view of the data that allows me minimal passes through it to calculate the maximum overlap for k intervals?

I threw together a rough solution and I was quite pleased with myself. The code has been cleaned up and as Skydiver pointed out, I am now using a std::bitset as this will allow me more bits than a 4 byte unsigned integer and it's cleaner.

The data structure is as follows:

Each interval has a start and end integer and an associated length (both start and end are inclusive)
The table contains an array of Slot objects, where the size is max element +1 to allow for 1 based array indexing to make the code read better.
Each Slot object has a count of how many intervals are part of that number, it has markers to note if its a start or end number for any given interval and it contains a bitmask indicating what interval passes through it in any capacity.

The algorithm is as follows:

For each interval, add it to the table.
The table builds a Slot array to N+1 size, where N is max element in the vector of Intervals.
In each insertion, starting with the start element and continuing to the end element of the Interval, each Slot is incremented indicating how many intervals are in it. Start and end are marked and the bitmask is updated for that slot.

Using the built out Table we can now immediately know what part of the search space to ignore based on our k value. This is enabled because we marked each Slot with how many intervals passed through it. i.e. if k is 2, then we ignore any Slot with less than two intervals passing through it.

We iterate linearly through the Table, from 1 to the end of the range., ignoring any Slot.intervals < k.
If this index is a start node continue until we no longer have Slot.intervals >= k or we hit the end of the Table or we hit an end node. Once we hit an end node, we check to see if we had at least the same k intervals throughout the entire range (this is what the bitmask & check is for), we then check to see if we have a larger range than we last got and assign if so.

This continues throughout the data set and we return the largest range found for any given k intervals. Note that it must be the same k intervals throughout the range, otherwise it is not valid.

The dash lines in the diagrams represent the search space based on the value k. You can also see this in the intervals table below the number line.

Example data sets and diagrams:

[3,9]
[1,6]
[4,8]

Posted Image

[3,9]
[1,6]
[4,8]
[1,9]

Posted Image

[3,8]
[4,12]
[2,6]
[1,10]
[5,9]
[11,12]

Posted Image


Source Code:

structure:

#ifndef __TABLE_H
#define __TABLE_H
#include <iostream>
#include <vector>
#include <cmath>
#include <bitset>

using namespace std;

struct Interval {
  Interval():start(0),end(0),length(0){}
  Interval(int a, int b):start(a),end(b), length(b-a+2){}
  bool contains(int c) {return ((c >= start && c <= end));}
  int start;
  int end;
  int length;
  friend ostream& operator<<(ostream& os, Interval& rhs);
};

ostream& operator<<(ostream& os, Interval& rhs){
  os << "[" << rhs.start << "," << rhs.end << "]\n";
  return os;
}

struct Slot {
  Slot():intervals(0),start(false),end(false),part_of(0x0) {}
  unsigned int intervals;
  bool start, end;
  bitset<50> part_of; //bit mask what Intervals this slot is a part of
};

//table requires knowledge of the max elements it can have from any interval
struct Table {
  Table(int r):range(r){ intervals = new Slot[range];}
  ~Table() {delete[] intervals;}
  int range;
  Slot* intervals;
  friend ostream& operator<<(ostream& os, Table& rhs);
  void add_interval(const Interval& rhs){
    static unsigned int mask = 0x00;
    for(int i = rhs.start; i <= rhs.end; i++) {
      intervals[i].intervals++;
      if (i == rhs.start){
        intervals[i].start = true;
      }
      if (i == rhs.end) {
        intervals[i].end = true;
      }
      intervals[i].part_of |= (int)pow(2.0, (int)mask); 
    }
    ++mask;
  }
  void print_starts(){
    cout << "Starts:\n";
    for(int i = 1; i < range; i++) if (intervals[i].start == true) cout << i << endl;
  } 

  void print_ends(){
    cout << "Ends:\n";
    for(int i = 1; i < range; i++) if (intervals[i].end == true) cout << i << endl;
  } 

  Interval find_max_overlap(int k){
    int max = 0;
    int max_start = 0;
    int max_end = 0;
    Interval result;
    //throw out anything < k  
    //search from start to end
    for(int i = 1; i < range; i++){
      if(intervals[i].intervals < k) continue;
      if (intervals[i].start){
        int j = i+1;
        int tmp = 0;
        while(intervals[j].intervals >= k && j < range) {
          //cout << "checking " << i << " " << j << endl;
          if(intervals[j].end == true && ((intervals[i].part_of & intervals[j].part_of) == intervals[j].part_of)){
            //cout << " i: " << i << " j: " << j << endl;
            //cout << " i mask: " << intervals[i].part_of << " j mask: " << intervals[j].part_of << endl;
            //cout << " i & j: " << (intervals[i].part_of & intervals[j].part_of) << endl;
            //cout << "found overlap of size: " << (j-i+1) << endl;
            tmp = (j-i+1);
            if (tmp > max){
              //cout << "setting max to tmp\n";
              max = tmp;
              max_start = i;
              max_end = j;
            }
          }
          j++;
        }//while
      }
    }//linear pass
    result.start = max_start;
    result.end = max_end;
    result.length = result.end-result.start+2;
    return result;
  }
};

ostream& operator<<(ostream& os, Table& rhs){
  os << "\nnumber line | ";
  for(int i = 1; i < rhs.range; i++) os << i << " | "; 
  os << "\n---------------------------------------------------------------\n";
  os << "intervals   | ";
  for(int i = 1; i < rhs.range; i++) os << rhs.intervals[i].intervals << " | ";
  os << "\n";
  return os; 
}

int find_storage_size(const vector<Interval>& intervals){
  int max = 0;
  for(const Interval& interval : intervals){
    if (interval.end > max) max = interval.end;
  }
  return (max+1);//to 1 base the array
}

#endif



driver:

#include "Table.h"

int main(){
  vector<Interval> intervals;
  Interval one(3,9);
  Interval two(1,6);
  Interval three(4,8);
  intervals.push_back(one);
  intervals.push_back(two);
  intervals.push_back(three);

  Table table(find_storage_size(intervals));
  for(Interval& in : intervals) table.add_interval(in);
  cout << table;
  cout << "\n\n";
  cout << "Longest intersection of any 2 lines:\n";
  Interval result = table.find_max_overlap(2);
  cout << result;
  cout << "Longest intersection of any 3 lines:\n";
  result = table.find_max_overlap(3);
  cout << result;

  cout << "\n\n";
  intervals.push_back(Interval(1,9));
  Table table2(find_storage_size(intervals));
  for(Interval& in : intervals) table2.add_interval(in);
  cout << table2;
  cout << "\n\n";
  cout << "Longest intersection of any 2 lines:\n";
  result = table2.find_max_overlap(2);
  cout << result;
  cout << "Longest intersection of any 3 lines:\n";
  result = table2.find_max_overlap(3);
  cout << result;


  cout << "\n\n";
  //another example
  vector<Interval> intervals2;  
  Interval one2(3,8);
  Interval two2(4,12);
  Interval three2(2,6);
  Interval four(1,10);
  Interval five(5,9);
  Interval six(11,12);
  intervals2.push_back(one2);
  intervals2.push_back(two2);
  intervals2.push_back(three2);
  intervals2.push_back(four);
  intervals2.push_back(five);
  intervals2.push_back(six);

  Table table3(find_storage_size(intervals2));
  for(Interval& in : intervals2) table3.add_interval(in);
  cout << table3;
  cout << "\n\n";
  cout << "Longest intersection of any 2 lines:\n";
  result = table3.find_max_overlap(2);
  cout << result;
  cout << "Longest intersection of any 3 lines:\n";
  result = table3.find_max_overlap(3);
  cout << result;
  cout << "Longest intersection of any 4 lines:\n";
  result = table3.find_max_overlap(4);
  cout << result;
  return 0;
}



Output:

Quote

number line | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---------------------------------------------------------------
intervals | 1 | 1 | 2 | 3 | 3 | 3 | 2 | 2 | 1 |


Longest intersection of any 2 lines:
[4,8]
Longest intersection of any 3 lines:
[4,6]



number line | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---------------------------------------------------------------
intervals | 2 | 2 | 3 | 4 | 4 | 4 | 3 | 3 | 2 |


Longest intersection of any 2 lines:
[3,9]
Longest intersection of any 3 lines:
[4,8]



number line | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---------------------------------------------------------------
intervals | 1 | 2 | 3 | 4 | 5 | 5 | 4 | 4 | 3 | 2 | 2 | 2 |


Longest intersection of any 2 lines:
[4,10]
Longest intersection of any 3 lines:
[5,9]
Longest intersection of any 4 lines:
[5,8]


Source can also be found at my GitHub here.

Happy coding!

1 Comments On This Entry

Page 1 of 1

Skydiver Icon

15 November 2017 - 07:33 PM
Yes, an interesting problem. It gets even more interesting when applying the 128MB memory cap. The table of slots looks to start taking up too much memory unless it is implemented as a linked list.
0
Page 1 of 1

November 2017

S M T W T F S
   1234
567891011
12131415161718
192021 22 232425
2627282930  

Tags

    Recent Entries

    Search My Blog

    7 user(s) viewing

    7 Guests
    0 member(s)
    0 anonymous member(s)