0 Replies - 571 Views - Last Post: 12 May 2019 - 07:35 PM

#1 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 7095
  • View blog
  • Posts: 24,115
  • Joined: 05-May 12

When what looks thread safe is not thread safe...

Posted 12 May 2019 - 07:35 PM

I just wanted to share a bit of code that nominally should belong in Nightmare-In-Code. I wrote it in the midst of a code writing fugue about three months ago. When I first wrote it, I completely believed it to be thread safe until about a week ago when I would occasionally get unexpected behavior. The code looks like this:
HashSet<string> _checkedDirectories = new HashSet<string>();

void EnsureDirectoryExists(string directory)
{
    if (!_checkedDirectories.Contains(directory))
    {
        System.IO.Path.CreateDirectory(directory);
        _checkedDirectories.Add(directory);
    }
}



I'd originally convinced myself that the code above was thread safe with the following reasoning:
CreateDirectory() won't fail if the directory already exists. If two different threads try to create a directory, both of them will succeed.
HashSet<T>.Add() won't fail if an entry being added already exists. If two different threads try to add the same directory name, then both will succeed.
HashSet<T>.Contains() will either return true or false. If two threads both come in and calls Contains() and both discover that the directory should be created, no problem because of the prior two statements. Same thing if one thread comes in one step behind another thread. No harm, no foul due to the prior two statements.

Unfortunately, the thing that I didn't account for is that none of the three methods above are guaranteed to be atomic. In particular, the danger comes up when two threads are simultaneously running HashSet<T>.Add(). Examining the code in the reference sources, there are chances of the internal data structure getting messed up if two threads are interleaved.

So the obvious minimal fix is to simply wrap a lock around the whole body of the EnsureDirectoryExists() method. Alas, that had the unintended consequence of suddenly serializing my 32 different threads which are trying to download files from SharePoint.

For a little while, I toyed with the idea of just wrapping the Add() call with a lock since that is really the critical section that needs to be protected. The Contains() and CreateDirectory() could survive thread collisions. In the end, though, I opted to just switch over to using a ConcurrentDictionary<>. What really drove my decision was not having to write a comment explaining why the lock around Add() is good enough, and why a lock around the entire body is Bad IdeaTM. The weaker decision driver was that ConcurrentDictionary<> is written to be thread safe.

I'm curious if you would have opted for either the serializing inducing lock, or the "Lucy, you have some 'splaining to do" lock instead?

Is This A Good Question/Topic? 0
  • +

Page 1 of 1