Your Arsenal

So what primitives are in your arsenal for implementation of advanced synchronization algorithms?

Compare-And-Swap

Perhaps, it's the most famous primitive, it's other names are CAS, compare-and-exchange, compare-and-set, std::atomic_compare_exchange, InterlockedCompareExchange, __sync_val_compare_and_swap, LOСK CMPXCHG and other. It's an instance of so-called atomic RMW (read-modify-write) operation. It's pseudo-code is:

T compare-and-swap(T* location, T cmp, T xchg)

{

do atomically

{

T val = *location;

if (cmp == val)

*location = xchg;

return val;

}

}

That is, it stores a new value (xchg) into a memory location only if it contains an expected value (cmp), in either case it returns a value that was stored in the location when the operation begins. And all that is done atomically on hardware level.

Fetch-And-Add

Also atomic RMW operation, and also conducted atomically in hardware. Aka atomic_fetch_add, InterlockedExchangeAdd, LOСK XADD. Below is the pseudo-code:

T fetch-and-add(T* location, T x)

{

do atomically

{

T val = *location;

*location = val + x;

return val;

}

}

There are also variations like fetch-and-sub, fetch-and-and, fetch-and-or, fetch-and-xor.

Exchange

Atomic RMW. Aka atomic_exchange, XCHG. Dead simple, but not less useful:

T exchange(T* location, T x)

{

do atomically

{

T val = *location;

*location = x;

return val;

}

}

Atomic loads and stores

They are not RMW (read-modify-write) operations, they are just independent atomic loads and stores. They are frequently unfairly underestimated. However, they play fundamental role in synchronization algorithms, and they are what you should generally strive for - atomic loads and stores are better/cheaper/faster than atomic RMW operations.

Mutexes and the company

Why not? The most stupid thing one can do is try to implement everything in a non-blocking style (of course, if you are not writing infantile research paper, and not betting a money). Generally it's perfectly Ok to use mutexes/condition variables/semaphores/etc on cold-paths. For example, during process or thread startup/shutdown mutexes and condition variables is the way to go.

Now, let's move from the abstract world closer to the reality.

On Windows platform the primitives are available in the form of InterlockedXXX functions, and generally you should prefer Microsoft Visual C++ intrinsics starting from underscore _InterlockedXXX.

In GNU world they available in the form of gcc __sync intrinsics.

On Solaris they are available in the form of atomic_ops.

There is also quite portable libatomic_ops.

And hopefully all that zoo will be unified under umbrella of C1x/C++0x in the form of language-provided atomic primitives.

And in Java world there is java.util.concurrent.atomic.

And in .NET world there is System.Threading.Interlocked.

Move on to So what is a memory model? And how to cook it?