QuickSort - Stack Overflow when keys are not randomized

  • (2 Pages)
  • +
  • 1
  • 2

25 Replies - 2527 Views - Last Post: 08 February 2013 - 03:06 PM Rate Topic: -----

#1 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

QuickSort - Stack Overflow when keys are not randomized

Posted 05 February 2013 - 07:39 PM

Hi all! I have an interesting problem. For my current algorithms class, we are to implement 4 different quicksort algorithms. The quicksorts differ in where the pivot is selected from. The 4 functions I am implementing are called qsf (quicksort first key pivot), qsl (quicksort last key pivot), qsm (median of medians pivot), qsr(random pivot). We are to do so with varying input sizes (predefined 64000, 256000, 1024000, and 8192000) as well as various input sequences (random, all equivalent, decreasing order, increasing order)

I have programmed the first 2 algorithms (qsf, qsl) which have run quite well for ALL input sizes when provided with a random input sequence. However, when they are fed all equivalent keys (all elements in array are 32.25), decreasing, and even increasing input sequences, the program seems to stall for a bit and then windows gives an error. I raw code in Notepad++, but use Visual Studio's debugger which is telling me I am getting a stack overflow in these cases.

Note:These 3 input types DO work up to an input size of about 43400 (I tested a bunch of times and they don't seem to work past 43400) and, strangely enough, the longest runner is the "all equivalent" input sequence.

Here's most of the code (I left out the utility functions to save space). My professor asked that we use simple command line arguments for the inputs (using 1-4 as options for each of the specifiable command line arguments. This is why I have all the code for converting argv to integers etc.) He also wants us to use the array range 1 to n instead of 0 to n-1.

int main(int argc, char** argv){

	//make sure all arguments are present
	if(argc != 4){
		errorMessage(argv[0]);
		return 0;
	}

	//cast the arguments to integers
	int algorithmPointer = *argv[1] - '0';
	int problemPointer = *argv[2] - '0';
	int sequencePointer = *argv[3] - '0';
	
	//make sure all arguments are within the acceptable range
	if(algorithmPointer < 1 || algorithmPointer > 4 || problemPointer < 1 || problemPointer > 4 || sequencePointer < 1 || sequencePointer > 4){
		errorMessage(argv[0]);
		return 0;
	}
		
	char algorithmChooser[][5] = {"null", "qsf", "qsl", "qsm", "qsr"};
	char* algorithmTitle = algorithmChooser[algorithmPointer];
	
	int nChooser[5] = {0, 64000, 256000, 1024000, 8192000};
	int n = nChooser[problemPointer];	
	
	char sequenceChooser[][20] = {"null", "random", "all keys equivalent", "decreasing sequence", "increasing sequence"};
	char* sequenceTitle = sequenceChooser[sequencePointer];
	
	cout << "\nChosen algorithm is: " << algorithmTitle << endl;
	cout << "Chosen problem size (n) is: " << n << endl;
	cout << "Chosen sequence is: " << sequenceTitle << endl;	
	
	//setup the key list array
	double *A = new double[n+1];
	setupKeys(A, sequencePointer, n);
	
	//declare variables to hold timer values
	double timeIn, timeOut;
	
	//call the algorithm according to the command line argument
	switch(algorithmPointer){
		
		case 1:	timeIn = getTime();
				qsf(A, 1, n);
				timeOut = getTime();
				break;
		
		case 2: timeIn = getTime();
				qsl(A, 1, n);
				timeOut = getTime();
				break;
				
		/*case 3: timeIn = getTime(); 
				qsm(A, 1, n);
				timeOut = getTime();
				break;
				
		case 4: timeIn = getTime();
				qsr(A, 1, n);
				timeOut = getTime();
				break;*/
	
	}
	
	cout << endl << "Total time: " << (timeOut - timeIn)/CLOCKS_PER_SEC << " seconds\n";

	delete[] A;
	
	return 0;
}

/************************************************************
***************************QSF*******************************
************************************************************/

void qsf(double *A, int left, int right){
	if (left < right){
		int pivotPoint = partitionQSF(A, left, right); //get pivotPoint from the partition function & partition
		qsf(A, left, pivotPoint - 1); //recursive call for values <= pivot
		qsf(A, pivotPoint + 1, right); //recursive call for values > pivot
	}
}

int partitionQSF(double *A, const int &p, const int &r){
	
	//store the pivot value (in this case it is the first key)
	double pivot = A[p]; 
	
	swap(A[p], A[r]); //swap the pivot value with the last term for safe keeping
	
	int i = p - 1; //initialize the counter which counts the # of terms <= pivot
	
	//scan left -> right and partition based on comparing to pivot
	for (int currentIndex = p; currentIndex <= r - 1; currentIndex++){	
		if(A[currentIndex] <= pivot){
			//increment the counter & perform swap
			i++;
			swap(A[i], A[currentIndex]);			
		}
	}
	swap(A[i + 1], A[r]); //place the pivot where it belongs in the array
	return (i + 1); //return the position of the pivot
}

/************************************************************
***************************QSL*******************************
************************************************************/

void qsl(double *A, int p, int r){
	if (p < r){
		int q = partitionQSL(A, p, r);
		qsl(A, p, q - 1);
		qsl(A, q + 1, r);
	}
}

int partitionQSL(double *A, const int &p, const int &r){
	
	double pivot = A[r];
	int i = p - 1;
	for (int j = p; j <= r - 1; j++){	
		if(A[j] <= pivot){
			i++;
			swap(A[i], A[j]);			
		}
	}
	swap(A[i + 1], A[r]);
	return (i + 1);
}



And here's the code I used to populate the array:
void setupKeys(double *keys, const int &choice, const int &range){

	switch(choice){
		case 1:	srand(time(NULL));
				for(int i = 1; i <= range; i++){
					double randomNum = (double)(rand() % range) + getRand();
					keys[i] = randomNum;
				}
				break;
		
		case 2: for(int i = 1; i <= range; i++){
					keys[i] = 32.25;
				}
				break;
		
		case 3: for(int i = 1; i <= range; i++){
					keys[i] = (double)(range - i);
				}
				break;
		
		case 4: for(int i = 1; i <= range; i++){
					keys[i] = (double)(i + 1);
				}
				break;
				
	}

}



Thanks ahead for any help solving this!

This post has been edited by axnjxn: 05 February 2013 - 07:42 PM


Is This A Good Question/Topic? 0
  • +

Replies To: QuickSort - Stack Overflow when keys are not randomized

#2 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 05 February 2013 - 08:04 PM

Sounds like a Quicksort Bashing to me. :(

The program is crashing because the depth of recursion is too great - of course. Rather naively, the program is taking any sub array to work with, instead of the smallest sub array, first.

Try this implementation of Quicksort, and see how it does. In my testing it SMOKES most others, and doesn't crash - but I have never tried it with 44,000 matching integers, either. ROFL!

Let me know how it works, and note how it selects the smallest sub array to process, first.

void insertionSort(int lo, int hi) {
   int i, j, val; 
   for(i=lo+1;i<hi;i++) {  
      val = A[i];
      j = i-1;
      while(A[j]>val) {
         A[j + 1] = A[j];
         --j;
         if(j<0) 
            break;
      }   
      A[j+1] = val;
   }
}

//Optimized Quicksort Note: "hi" hi should be 
//the highest index, not the size of the array
void quicksortOpt(int lo, int hi) { 
   int i, j, pivot,temp;
   if(lo == hi) return; 

  if(hi - lo < 80) {  //The cut-off. Varies with system 
    insertionSort(lo, hi+1);            
    return;
  }

   i=lo; 
   j=hi;
   pivot= A[(lo+hi)/2]; 

  /* Split the array into two parts */
   do {    
      while (A[i] < pivot) i++; 
      while (A[j] > pivot) j--;
      if (i<=j) {  
         temp=A[i];
         A[i]=A[j];
         A[j]=temp;   
         i++;
         j--;
      }
   } while(i<=j);
  
   if (lo < j) quicksortOpt(lo, j); //take the smaller sub-array first
   if (i < hi) quicksortOpt(i, hi);
}



Was This Post Helpful? 0
  • +
  • -

#3 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 06:53 AM

Hi, Adak, I appreciate your replay but note that I'm not bashing quicksort! ;)/>/> There are several problems with your reply, though.

First, the pivots I'm to select are predefined for all four implementations. Use the first key in each iteration (qsf), use the last key in each iteration (qsl), use the median of medians (qsm) which must be gotten in an efficient manner, and random pivot (qsr). I haven't implemented the last 2 in my coding yet (it is commented out) b/c I have yet to get the first two to work properly with ALL the input sequences that need to be serviced (random works, but all equal/increasing/decreasing don't).

Second, I'm fairly certain he doesn't want us to use an iterative insertion sort for smaller inputs. We're to use a pure quick sort (but I will ask him next class as we have another week before it is due). As for selecting the smaller of the arrays to sort first, I can put an if statement in the function to test which is smaller and process that first but I'm not sure how that would help me here.

Although I do appreciate your timely reply, the purpose of my post was really to find why the program works fine with random inputs (every time) but crashes every time even when using an already sorted sequence as input with n=64000 (smallest input). I understand there are too many recursive calls happening, but then, why would this occur even with a sorted sequence?

Thanks, again, in advance.

This post has been edited by axnjxn: 06 February 2013 - 07:06 AM

Was This Post Helpful? 0
  • +
  • -

#4 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 07:17 AM

No, I meant that the Instructor looked like he set out homework to show how "inadequate" Quicksort is.

I thought you knew - when a list is already sorted, a VERY bad pivot will cause the split of the array, to be just ONE number - ALL THE REST of the numbers, are still on the other side of the partition! (This happens when the pivot is the first number, and the data is already sorted.) Now when the function recurses guess what?? AGAIN, the pivot is shit, and causes JUST ONE number to be removed from the sub array - and ALL THE REST OF THE ARRAY is still on the other side, clumped together. :( Eventually the recursive calls get too deep, and the program crashes.

A bad pivot is a DISASTER for quicksort, if it's repeated with every call of the function. Quicksort works BEST when the pivot is at the middle of the array, to divide it about equally. That's true with the first partition of the array, and that's true with the partition of ALL the sub arrays, as well.
Was This Post Helpful? 1
  • +
  • -

#5 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 11:02 AM

Bah! Of course. That was quite silly of me. I assume his purpose in choosing these particular input sequences were to make sure we built an algorithm that could stand those kinds of abuses. However, now I'm at a loss. How does one overcome the limitations of the stack? He says we must run our algorithm successfully given all possible combinations. Since choosing the first/last key position pivot in sorted algorithms is causing stack overflow on the smallest input... bleh.

I'm going to have to ask him tomorrow in class. Unless you more advanced programmers have any suggestions on how to avoid overflowing the stack while still using recursion and the given requirements...
Was This Post Helpful? 0
  • +
  • -

#6 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 01:17 PM

Already answered that problem.

From the code I posted above: 

 if (lo < j) quicksortOpt(lo, j); //take the smaller sub-array first
 if (i < hi) quicksortOpt(i, hi);



Taking the smaller sub-array first, knocks down the number of recursive calls the program makes.
Was This Post Helpful? 1
  • +
  • -

#7 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 02:08 PM

I'll do it now and post an update. Thank you for your help.
Was This Post Helpful? 0
  • +
  • -

#8 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 02:46 PM

Well, I implemented exactly as you have posted with the exception on using the first key as a pivot (as per my posts / professor's instructions) and I'm still getting stack overflows on input size 64000. In fact, I'm now getting stack overflow errors for input size as small as 150 when using randomly generated keys or all equivalent keys.

I'm truly stumped now. I don't really understand how I'm supposed to implement quick sort with the requirements the professor gave us without generating a stack overflow (especially when the input size is 8192000)!
Was This Post Helpful? 0
  • +
  • -

#9 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 03:35 PM

I set up my sorting test program to give me random, ascending, descending, and all equal integer values.

I'm up to 20 million numbers being sorted, for each of these. I've had no crashes. LOTS of swaps when the ints are all equal, but zero crashes - and the sort times are even good! ;)/>

I suspect your quicksort code is handling equal keys differently, since I see <= on moving the left and right "markers" (i and j).

So it appears you're not "doing the same thing".

This post has been edited by Adak: 06 February 2013 - 03:47 PM

Was This Post Helpful? 0
  • +
  • -

#10 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 05:36 PM

Adak, when I said that it is because I implemented your code exactly as you posted it but I used the FIRST value in the array as the pivot because it is a requirement of my professor. Please try to run your implementation but instead of using the middle value of the array as pivot, use the FIRST value of the array. Also, use doubles as the key values in the array as that is another requirement of the professor (as you can see in my original code).

I'm not making it up. I copy/pasted your code and only switched how the pivot was set and took out Insertion Sort so that it matched up with the requirements of my professor. The <= you are referring to is from my original code (not the code you posted which I tried out). Again, try your implementation using double values instead of int and set the pivot up to be A[lo] instead of A[(lo+hi)/2] and remove the Insertion Sort. If it still works perfectly for you, then I have to start asking about compiler/OS differences we may have.

Thanks again.


EDIT:
Okay, I just copy/pasted your code again only this time, I didn't change anything about it. The only difference is that you are apparently using a global array(?) because there I don't see any declarations and you aren't passing a pointer into either function. I can't use globals (another requirement) and I was sure you aren't supposed to in general. So, the only change I made this time was to add in a double *A argument to both functions so that they can be passed the dynamically declared array from my main function.

The results are really weird. With random input sequence, the program will complete successfully 3 out of 4 times on the smallest input but NEVER completes on the largest. With all equal input sequence, the program never completes successfully (64000 double values). With decreasing order, it always works(huh?). And with increasing order it always works(huh?). And both of those work fine on the largest input (and FAST like you said). So, I'm really confused now.

This post has been edited by axnjxn: 06 February 2013 - 05:53 PM

Was This Post Helpful? 0
  • +
  • -

#11 jimblumberg  Icon User is offline

  • member icon


Reputation: 3993
  • View blog
  • Posts: 12,321
  • Joined: 25-December 09

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 05:47 PM

I suggest you post your current code, then maybe can spot any differences between the code or maybe someone else might be able to spot a problem.

Jim
Was This Post Helpful? 0
  • +
  • -

#12 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 06:12 PM

Well, I know why the function is working for me only with increasing and decreasing input sequences. It is because the compiler is automatically casting these values to integers (because they have no decimals). I tried changing the way the array setup occurs:

for(int i = 1; i <= range; i++){
	keys[i] = (double)(i + 1.0);
}


but it hasn't done anything. I tried casting to double everywhere and the compiler continues to automatically cast (for example) double value 1200.0 to integer value 1200.

What is really confusing is that NONE of the input sequences are working on the smallest input! This is driving me absolutely batty. How is it that there is no stack overflow from input size 8192000, but it is getting one for input size 64000 with everything else remaining the same?!

As requested by jimblumberg here is the code as it currently stands:
void qsf(double *A, int lo, int hi) { 
   int i, j, pivot,temp;
   if(lo == hi) return; 

   if(hi - lo < 80) {  //The cut-off. Varies with system 
    insertionSort(A, lo, hi+1);            
    return;
  }
   
   i=lo; 
   j=hi;
   pivot= A[(lo+hi)/2]; 

  /* Split the array into two parts */
   do {    
      while (A[i] < pivot) i++; 
      while (A[j] > pivot) j--;
      if (i<=j) {  
         temp=A[i];
         A[i]=A[j];
         A[j]=temp;   
         i++;
         j--;
      }
   } while(i<=j);
  
   if (lo < j) qsf(A, lo, j); //take the smaller sub-array first
   if (i < hi) qsf(A, i, hi);
}

void insertionSort(double *A, int lo, int hi) {
   int i, j, val; 
   for(i=lo+1;i<hi;i++) {  
      val = A[i];
      j = i-1;
      while(A[j]>val) {
         A[j + 1] = A[j];
         --j;
         if(j<0) 
            break;
      }   
      A[j+1] = val;
   }
}



  • How do I make sure that the compiler doesn't cast the double values to integer values (how do I make sure 34.0 remains 34.0)?
  • How do I get quicksort implemented with double values so that it won't experience stack overflow?
  • Is there any way to do this using the implementation I originally posted? (as the professor doesn't want us to stray too far away from the form of the code given in our textbooks)
  • Why am I getting a stack overflow on inputs of 64000 when the same exact inputs are working on inputs of 8192000?!
  • Is all of this possible if using the FIRST or LAST keys in the array as pivots (instead of using the middle term). This is one of the requirements of the assignment. The implementation should be able to run using first value as pivot, last value, middle value (median of medians), and randomly selected value.


All your help is extremely appreciated!

EDIT
I noticed that I was casting the doubles to int because I didn't change the part of the code Adak posted. If you notice above:
int i, j, pivot, temp;



pivot and temp are both declared as integers. So when the ####.0 doubles were assigned to pivot and temp, they were being cast as integers. I corrected the code to be:
int i, j;
double pivot, temp;


and now I am getting the stack overflows for ALL input sequences. I think it is plain to see that Adak is using integer values in his array and not doubles. But the array is dynamically declared! Doesn't that mean the values are on the heap NOT the stack? Why would that make a difference? Grrr....

This post has been edited by axnjxn: 06 February 2013 - 07:30 PM

Was This Post Helpful? 0
  • +
  • -

#13 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 08:29 PM

The index variables, i and j in my code, should ALWAYS be int's - indices can't be doubles. There is no A[22.5], etc.

My array is global, and thus on the heap. Your array is dynamic, and thus on the heap, as well. You could never make a local array with millions of doubles, on a local stack segment of memory.

Globals are normally bad, but for such a small program as my sorting tester, it's fine.

Taking the middle pivot point always helps Quicksort. Now you know why both incrementing and decrementing data is well treated by my version. The middle pivot I choose, is very close to being the exact middle value.

I will make the change to a dynamic array, use doubles as the data type, and change the pivot selection, and see what can be done. After dinner.

This post has been edited by Adak: 06 February 2013 - 08:30 PM

Was This Post Helpful? 0
  • +
  • -

#14 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 06 February 2013 - 10:50 PM

OK, I changed my test program. Yes, it now crashes on ascending data even at 123456 numbers in the array.

It uses dynamic array, and doubles too.

I will try adding some smarter code to the code, but I'm not confident you can save the program from a continuously dumb pivot choice.

In the iterative version of Quicksort, you can easily set the stack size to whatever you want (basically). It is "brittle" though. You could make an iterative version and manage the stack yourself, but I think this is beyond what you instructor wants.

I'm sure there is a way to enlarge the local function stack size. Google that and see what can be done there, or start a thread with that topic.

P.S.

I just tried it with the same settings it crashed on before with pivot = A[lo], and after increasing the stack size in the linker options to 5,000,000 bytes, it ran the program successfully.

On Pelles it was:
Project>Options>Linker> check "Large Address aware" box, and fill in 5000000 in the stack size box.

That will increase the size of the arrays that Quicksort can handle, but it will still crash - just at a larger number of values.

This post has been edited by Adak: 06 February 2013 - 11:06 PM

Was This Post Helpful? 1
  • +
  • -

#15 axnjxn  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 144
  • Joined: 04-February 12

Re: QuickSort - Stack Overflow when keys are not randomized

Posted 07 February 2013 - 06:37 AM

Thanks for all your help, Adak. It does seem rather impossible without changing the size of the stack. I'm going to have to ask my professor about this in tonight's class. He wants to be able to compile and run our programs on the school's AFS system. I'm fairly certain I won't be able to change the stack size for their compiler. This seems rather impossible as is. Thanks again for your help and I will post my professor's reply so I don't leave the issue hanging.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2