Multithreading is hard. Implementation of synchronization primitives is even harder. And most advanced synchronization primitives which exploit relaxed memory models are brain damaging. So don't rely on memory barriers for synchronization. Now you will think "Ah, you are one of those folks, who constantly saying to us - don't do this, don't do that, it's too difficult, it's too dangerous, anyway you will fail". Nope, I am not. I am saying exactly the opposite - do rely on memory barriers for synchronization... sometimes; because smart fine-grained synchronization is a matter of orders of magnitude performance difference and principal possibly of scaling (again sometimes, of course). But it's true that low-level synchronization primitives and memory ordering issues are hard. So some time ago I've developed a tool called Relacy Race Detector in order to help developers (actually in order to help me, but it doesn't matter) with these issues. Relacy Race Detector (RRD) is a synchronization algorithm verifier for relaxed memory models (formally - stateless dynamic verifier, if you are interested). Physically it's a header-only C++ library, but it's able to test not only C++ algorithms, but also Java, .NET/CLI, x86, PowerPC, SPARC, etc - more on this later. Now let's consider how work with RRD looks for end user. First of all you have to implement synchronization algorithm (mutex, concurrent data structure, etc) which is subject to verification. Then you express one or several unit-tests of the algorithm. Then you write several lines of code to start execution of the unit-test. Ok, now you may compile and start the program, and RRD will take care of the efficient execution of the unit-test, i.e. it will execute zillions of different thread interleavings. During execution of each interleaving it will be constantly conducting a number of built-in checks - checks for data races, deadlocks, livelocks, accesses to freed memory, double memory free, memory leaks, accesses to uninitialized variables, incorrect API usage (recursive acquisition of non-recursive mutex), as well as verification of user specified asserts and invariants. When (if) RRD will detect some error it will output detailed execution history that leads to the error (history includes such things as instances of the ABA problem, reorderings of memory accesses, thread blocking/unblocking). With such history in hands localization of the error is less than a problem. Great number of various built-in checks allows to specify no user asserts/invariants in many cases and still get exhaustive verification; for example, most mutual exclusion algorithms may be verified w/o user asserts. However, if you want to verify FIFO order of messages provided by producer-consumer queue, you will have to code this check manually. Regarding verification of Java/.NET programs. Initially I was targeted only at C++0x memory model and atomics API. But it turns out that C++0x memory model is so relaxed and atomics API is so general that I was easily able to map other memory models to C++0x memory model. So now RRD includes a set of thin wrappers for Java atomics and volatiles, .NET interlocked and volatiles, as well as POSIX and Win32 synchronization primitives. Thus, yes, you still have to code in C++, but you can verify your algorithm implementation as is would Java or .NET algorithm implementation. RRD contains 3 different schedulers: random scheduler, full search scheduler and context bound scheduler. Each represents some compromise between verification speed and completeness of the verification. All schedulers are fair, i.e. support verification of formally non-terminating programs. Now let's look at the simple example which mimics basic spin-mutex: #include <relacy/relacy.hpp> If we will run it we will see successful completion of the test. Now let's introduce some simple bugs into the code and see how RRD will react on them. I've replaced rl::memory_order_release with rl::memory_order_relaxed in mutex::unloсk() and run the program, the output is (I've omitted per-thread execution history and some non relevant details for brevity; [first number] in history is global operation index, second number is thread index): struct mutex_test execution history: We see that RRD easily detected data race on variable
mutex_test::data (more precisely this is data race type 2 which means
that conflicting data accesses are not adjacent in the execution history
although still conflicting). struct mutex_test : rl::test_suite<mutex_test, 2> {
Now mutex_test::data is declared as 'rl::atomic<int>' (so no more data races are possible on it), and I've slightly changed the way I increment the variable. Here is the output (remember, unloсk operation is still done with memory_order_relaxed): struct mutex_test execution history: We see that RRD have detected on the fifth iteration that thread #0
loaded not current value from the 'data' variable, so that final value
of the variable is 1 instead of 2. You can download the library here: http://sites.google.com/site/1024cores/relacy_2_3.zip |