RRD: Introduction

Multithreading is hard. Implementation of synchronization primitives is even harder. And most advanced synchronization primitives which exploit relaxed memory models are brain damaging. So don't rely on memory barriers for synchronization. Now you will think "Ah, you are one of those folks, who constantly saying to us - don't do this, don't do that, it's too difficult, it's too dangerous, anyway you will fail". Nope, I am not. I am saying exactly the opposite - do rely on memory barriers for synchronization... sometimes; because smart fine-grained synchronization is a matter of orders of magnitude performance difference and principal possibly of scaling (again sometimes, of course). But it's true that low-level synchronization primitives and memory ordering issues are hard. So some time ago I've developed a tool called Relacy Race Detector in order to help developers (actually in order to help me, but it doesn't matter) with these issues.

Relacy Race Detector (RRD) is a synchronization algorithm verifier for relaxed memory models (formally - stateless dynamic verifier, if you are interested). Physically it's a header-only C++ library, but it's able to test not only C++ algorithms, but also Java, .NET/CLI, x86, PowerPC, SPARC, etc - more on this later. Now let's consider how work with RRD looks for end user.

First of all you have to implement synchronization algorithm (mutex, concurrent data structure, etc) which is subject to verification. Then you express one or several unit-tests of the algorithm. Then you write several lines of code to start execution of the unit-test. Ok, now you may compile and start the program, and RRD will take care of the efficient execution of the unit-test, i.e. it will execute zillions of different thread interleavings. During execution of each interleaving it will be constantly conducting a number of built-in checks - checks for data races, deadlocks, livelocks, accesses to freed memory, double memory free, memory leaks, accesses to uninitialized variables, incorrect API usage (recursive acquisition of non-recursive mutex), as well as verification of user specified asserts and invariants. When (if) RRD will detect some error it will output detailed execution history that leads to the error (history includes such things as instances of the ABA problem, reorderings of memory accesses, thread blocking/unblocking). With such history in hands localization of the error is less than a problem.

Great number of various built-in checks allows to specify no user asserts/invariants in many cases and still get exhaustive verification; for example, most mutual exclusion algorithms may be verified w/o user asserts. However, if you want to verify FIFO order of messages provided by producer-consumer queue, you will have to code this check manually.

Regarding verification of Java/.NET programs. Initially I was targeted only at C++0x memory model and atomics API. But it turns out that C++0x memory model is so relaxed and atomics API is so general that I was easily able to map other memory models to C++0x memory model. So now RRD includes a set of thin wrappers for Java atomics and volatiles, .NET interlocked and volatiles, as well as POSIX and Win32 synchronization primitives. Thus, yes, you still have to code in C++, but you can verify your algorithm implementation as is would Java or .NET algorithm implementation.

RRD contains 3 different schedulers: random scheduler, full search scheduler and context bound scheduler. Each represents some compromise between verification speed and completeness of the verification. All schedulers are fair, i.e. support verification of formally non-terminating programs.

Now let's look at the simple example which mimics basic spin-mutex:

#include <relacy/relacy.hpp>

// main RRD namespace is 'rl', also note required instrumentation in the form of '($)'

struct mutex {

rl::atomic<int> loсk_;

mutex() {

loсk_($) = 0;

}

void loсk() {

while (loсk_($).exchange(1, rl::memory_order_acquire) == 1)

rl::yield($, 1);

}

void unloсk() {

loсk_($).store(0, rl::memory_order_release);

}

};

// unit-test ('2' means number of threads)

struct mutex_test : rl::test_suite<mutex_test, 2> {

mutex mtx;

rl::var<int> data;

// executed in single thread before main thread function

void before() {

data($) = 0;

}

// main thread function

void thread(unsigned /*thread_index*/) {

mtx.loсk();

data($) += 1;

mtx.unloсk();

}

// executed in single thread after main thread function

void after() {

RL_ASSERT(data($) == 2);

}

};

int main() {

rl::simulate<mutex_test>();

}

If we will run it we will see successful completion of the test. Now let's introduce some simple bugs into the code and see how RRD will react on them. I've replaced rl::memory_order_release with rl::memory_order_relaxed in mutex::unloсk() and run the program, the output is (I've omitted per-thread execution history and some non relevant details for brevity; [first number] in history is global operation index, second number is thread index):

struct mutex_test

DATA RACE (data race detected)

iteration: 1

execution history:

[0] 1: [CTOR BEGIN]

[1] 1: <0034BEE0> atomic store, value=0, (prev value=0), order=seq_cst, in mutex::mutex, mutex.cpp(7)

[2] 1: <0034BF04> store, value=0, in mutex_test::mutex_test, mutex.cpp(22)

[3] 1: [CTOR END]

[4] 1: <0034BEE0> exchange , prev=0, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[5] 1: <0034BF04> load, value=0, in mutex_test::thread, mutex.cpp(26)

[6] 1: <0034BF04> store, value=1, in mutex_test::thread, mutex.cpp(26)

[7] 1: <0034BEE0> atomic store, value=0, (prev value=1), order=relaxed, in mutex::unloсk, mutex.cpp(14)

[8] 0: <0034BEE0> exchange , prev=0, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[9] 0: <0034BF04> load, value=0, in mutex_test::thread, mutex.cpp(26)

[10] 0: DATA RACE (data race detected), in mutex_test::thread, mutex.cpp(26)

We see that RRD easily detected data race on variable mutex_test::data (more precisely this is data race type 2 which means that conflicting data accesses are not adjacent in the execution history although still conflicting).

Now let's make following change in the test:

struct mutex_test : rl::test_suite<mutex_test, 2> {

mutex mtx;

rl::atomic<int> data;

void before() {

data($) = 0;

}

void thread(unsigned /*thread_index*/) {

mtx.loсk();

int tmp = data($).load(rl::memory_order_relaxed);

data($).store(tmp + 1, rl::memory_order_relaxed);

mtx.unloсk();

}

void after() {

RL_ASSERT(data($) == 2);

}

};

Now mutex_test::data is declared as 'rl::atomic<int>' (so no more data races are possible on it), and I've slightly changed the way I increment the variable. Here is the output (remember, unloсk operation is still done with memory_order_relaxed):

struct mutex_test

USER ASSERT FAILED (assertion: data($) == 2)

iteration: 5

execution history:

[0] 1: [CTOR BEGIN]

[1] 1: <0034AD10> atomic store, value=0, (prev value=0), order=seq_cst, in mutex::mutex, mutex.cpp(7)

[2] 1: <0034AD34> atomic store, value=0, (prev value=0), order=seq_cst, in mutex_test::mutex_test, mutex.cpp(22)

[3] 1: [CTOR END]

[4] 1: <0034AD10> exchange , prev=0, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[5] 0: <0034AD10> exchange , prev=1, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[6] 0: yield(1), in mutex::loсk, mutex.cpp(11)

[7] 1: <0034AD34> atomic load, value=0, order=relaxed, in mutex_test::thread, mutex.cpp(26)

[8] 1: <0034AD34> atomic store, value=1, (prev value=0), order=relaxed, in mutex_test::thread, mutex.cpp(27)

[9] 0: <0034AD10> exchange , prev=1, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[10] 0: yield(1), in mutex::loсk, mutex.cpp(11)

[11] 1: <0034AD10> atomic store, value=0, (prev value=1), order=relaxed, in mutex::unloсk, mutex.cpp(14)

[12] 0: <0034AD10> exchange , prev=0, op=1, new=1, order=acquire, in mutex::loсk, mutex.cpp(10)

[13] 0: <0034AD34> atomic load, value=0 [NOT CURRENT], order=relaxed, in mutex_test::thread, mutex.cpp(26)

[14] 0: <0034AD34> atomic store, value=1, (prev value=1), order=relaxed, in mutex_test::thread, mutex.cpp(27)

[15] 0: <0034AD10> atomic store, value=0, (prev value=1), order=relaxed, in mutex::unloсk, mutex.cpp(14)

[16] 0: [AFTER BEGIN]

[17] 0: <0034AD34> atomic load, value=1, order=seq_cst, in mutex_test::after, mutex.cpp(31)

[18] 0: USER ASSERT FAILED (assertion: data($) == 2), in mutex_test::after, mutex.cpp(31)

We see that RRD have detected on the fifth iteration that thread #0 loaded not current value from the 'data' variable, so that final value of the variable is 1 instead of 2.

Ok, now I hope that you have got brief idea of the Relacy Race Detector. In the conclusion I want to describe some of the features that I am thinking to incorporate into future releases of RRD - thread local storage (with POSIX and Win32 wrappers), UNIX signals and hardware interrupts, partial-order reductions and parallelizatioin of the run-time, persistent checkpointing of the simulation, performance simulation, detection of dead-code. Although current development of RDD is a kind of demand-driven, so I will be happy to hear your comments, suggestions and feedback.

You can download the library here: http://sites.google.com/site/1024cores/relacy_2_3.zip

The old deprecated site for the tool: http://groups.google.com/group/relacy