Page 1 of 1

Microsoft Windows Thread Synchronization Discusses the two approaches to thread synchronization

#1 Martyn.Rae  Icon User is offline

  • The programming dinosaur
  • member icon

Reputation: 540
  • View blog
  • Posts: 1,406
  • Joined: 22-August 09

Posted 10 March 2010 - 12:43 PM

Microsoft Windows Thread Synchronization

This tutorial discusses the two options for thread synchronization, the various API functions that exist to support those options, their merits and their pitfalls. I am assuming that the reader already knows how to create threads using the CreateThread windows API call.

Synchronization Options

Data that is shared between threads requires some degree of security to ensure that two or more threads do not affect each others behavior in an adverse manner. For example, let us take a message queue, where one thread adds messages to the queue and another thread takes the messages off the queue and processes them. during the process of adding and/or removing a message, the pointers are going to be in an indeterminate state, because the operation of adding a message to the queue is not atomic. Similarly, the operation of removing a message from the queue is not atomic. To make such an operation atomic, we would appear to have just one possibility, the mutex semaphore. Microsoft have been kind enough to provide us with a five routines that make using the mutex semaphore easier. These routines are:

  • InitializeCriticalSection (information here)
  • InitializeCriticalSectionAndSpinCount (information here)
  • InitializeCriticalSectionEx (information here)
  • EnterCriticalSection (information here)
  • TryEnterCriticalSection (information here)
  • LeaveCriticalSection (information here)
  • DeleteCriticalSection (information here)


This is by far the most common mechanism for synchronizing data between threads, that will ensure the integrity of common data is maintained at all times. The problem is it is slow, so very, very slow. I have written a small piece of code that provides timings for this mechanism, and the results I have obtained below.

#include <windows.h>
#include <iostream>
using namespace std;

CRITICAL_SECTION lock_section;

DWORD flag = 0;
DWORD counter1 = 0;
DWORD counter2 = 0;
LARGE_INTEGER frequency;
LARGE_INTEGER start_time;
LARGE_INTEGER end_time;

DWORD WINAPI threadProcedure(LPVOID) {
    int ix;

    for ( ix = 0; ix < 10000000; ix++ ) {
	EnterCriticalSection(&lock_section);
	if ( flag == 0 ) {
	    flag = 1;
	    counter2++;
	}
	LeaveCriticalSection(&lock_section);
    }
    return 0;
}

int main() {
    int ix;
    double time_taken;

    QueryPerformanceCounter(&start_time);
    InitializeCriticalSectionAndSpinCount(&lock_section, 1000);
    CreateThread(NULL, 0, threadProcedure, NULL, 0, NULL);
    for ( ix = 0; ix < 10000000; ix++ ) {
	EnterCriticalSection(&lock_section);
	if ( flag == 1 ) {
	    flag = 0;
	    counter1++;
	}
	LeaveCriticalSection(&lock_section);
    }
    QueryPerformanceCounter(&end_time);
    QueryPerformanceFrequency(&frequency);
    time_taken = ((double)(end_time.QuadPart - start_time.QuadPart))/(double)frequency.QuadPart;
    cout << "Total of " << (__int64)counter1/(__int64)time_taken << " switches a second" <<endl;
    Sleep(100000);
}



This code produced the following results: Total of 60,000 switches a second. This was running on my 2.6GHz quad core machine.

Nothing wrong with that I hear you say. But wait. I can increase the number of thread switches a second by a factor of approx. 150. How? By using a little known feature of the operating system that has been around since Windows 2000. This routine is called _InterlockedCompareExchange, details of which can be found here. What this routine does is to give us the ability to test and set the flag variable atomically. Furthermore, on the x86/x64 processor, this API call is actually a compiler intrinsic, which means it is resolved to a single machine code instruction rather than a function call. So, here is the code.

#include <windows.h>
#include <iostream>
using namespace std;

long flag = 0;
DWORD counter1 = 0;
DWORD counter2 = 0;
LARGE_INTEGER frequency;
LARGE_INTEGER start_time;
LARGE_INTEGER end_time;

DWORD WINAPI threadProcedure(LPVOID) {
    int ix;

    for ( ix = 0; ix < 10000000; ix++ ) {
	_InterlockedCompareExchange(&flag, 1, 0);
	counter2++;
    }
    return 0;
}

int main() {
    int ix;
    double time_taken;

    QueryPerformanceCounter(&start_time);
    InitializeCriticalSection(&lock_section);
    CreateThread(NULL, 0, threadProcedure, NULL, 0, NULL);
    for ( ix = 0; ix < 10000000; ix++ ) {
       _InterlockedCompareExchange(&flag, 0, 1);
       counter1++;
    }
    QueryPerformanceCounter(&end_time);
    QueryPerformanceFrequency(&frequency);
    time_taken = ((double)(end_time.QuadPart - start_time.QuadPart))/(double)frequency.QuadPart;
    cout << counter1 << "   " << counter2 << endl;
    cout << "Total of " << (__int64)counter1/(__int64)time_taken << " switches a second" <<endl;
    Sleep(100000);
}



This produces a staggering: Total of 10,000,000 switches a second. That is to say, using a critical section as opposed to an InterlockedCompareExchange is about one hundred and fifty time slower. So which method are you going to use next time you need to synchronize threads?

Deadlocks

I cannot finish off this tutorial without addressing deadlocks. A deadlock will often occur when two threads need to get their hands on two resources at the same time. Thread 1 obtains resource 1 and thread 2 obtains resource 2. Thread 1 then waits for resource 2 to become free before it can continue, and thread 2 waits for resource 1 to become free before it can continue. Deadlock time!!!

This again shows poor programming skills, because it is so easy to ensure that it never happens. Both threads should call a common routine for access to the two resources, to ensure that both threads request the resources in the same order.

Less common deadlock situations can occur when one thread acquires a resource that another thread needs, but forgets to release it. This is often known as hogging a resource, and again shows a rudimentary flaw in programming logic. Beware of the throw mechanism, or exception mechanism which could completely bypass the associated synchronization release mechanism. That is a particularly nasty bug to locate, especially if the throw/exception is handled and the program continues without any form of reporting.

Conclusion

Thread synchronization, and resource integrity is not as easy as it may first appear. The developer has to be extremely careful to ensure that resources shared between threads maintain data integrity without deadlocks. The mechanisms are provided by Microsoft through the API's, we just need to understand them and use them when necessary. Do not use CriticalSections, they are slow and archaic. Use InterlockedCompareExchange, or the compiler intrinsic version _InterlockedCompareExchange for maximum performance.

Is This A Good Question/Topic? 2
  • +

Page 1 of 1