Spinning

There are 2 ways to implement waiting in synchronization algorithms: OS blocking and spinning.

OS blocking generally requires the following steps. A waiter thread must check a condition which it wants to wait for, [register itself as a waiter], and then block itself by means of a semaphore, event, condition variable or whatever. A notifier must determine as to whether there are waiters or not, [unregister one or several waiters], and then notify them via a semaphore, event, etc. OS blocking has some pros and cons as compared to spinning: it's significantly harder to implement, it's penalizes fast-path performance (because notifiers have to check for waiter), but it's more efficient for long-term waiting (because waiters do not burn CPU cycles).

Spinning generally requires the following steps. A waiter checks for the condition, backoffs, checks for the condition, backoffs, and so on. As you see, notifiers are not involved in the process. Spinning is easier to implement, it does not penalize fast-path performance (the protocol is executed by an otherwise idle thread), however it's not optimal for long-term waiting and/or impacts latency (because a waiter may be in a "deep" backoff when the condition is satisfied). If spinning is used, then nobody except the thread does not know what it's waiting for; as opposed to blocking, when the thread needs to communicate what it's waiting for (otherwise, nobody will be able to notify it).

Let's concentrate on spinning, what types of spinning there are, how it may be implemented and so on.

There are 3 types of spinning:

- Active or processor-level spinning.

A thread notifies the processor (but not OS) that it's waiting for something. Notification of processor is done by means of instructions like PAUSE (x86) or hint0 (IA-64) (other architectures may have similar instructions - check a manual for your processor). However OS thinks that it's just a normal thread doing useful work. As a result, CPU cycles are senselessly burned during waiting.

Notification of processor is important, never implement waiting just as:

while (!something) {}

Instruction like PAUSE (1) improve performance (help to fight memory ordering issues inside of a processor), (2) decrease power consumption, (3) further improve performance in the context of HyperThreading/Simultaneous Multithreading/Chip-level Multithreading (because a processor gives preference to siblings of waiting thread). So the code should look like:

while (!something)

{

_mm_pause(); // for MSVC/IA-32

__yield(); // for MSVC/IA-64

__asm__ __volatile__ ("pause"); // for gcc/IA-32

}

Note that active spinning is useless on single-core/single-processor systems, because the condition can't possibly change until the thread gives up control. Active spinning is useful on systems with hardware concurrency (multicore/multiprocessor), because it assumes that another thread will change the condition while the thread spins.

- Passive or OS level spinning.

A thread notifies OS that it's waiting for something, so that OS is able to switch control to another runnable thread (if any). Passive spinning is good because if there is some other useful work to do (a runnable thread), OS will switch to it - that useful work should be done sometime anyway, right? So why not switch to it now, when the thread can't make useful forward progress anyway? However in return passive spinning increases latecy for a waiting thread. What is more important depends on an application.

On Linux passive spinning is implemented with pthread_yield(), or nanosleep() for "deeper" spin.

On Windows situation is a bit more involved. There is SwitchToThread() which is limited to the current processor. There is Sleep(0) which is limited to threads of no-less priority (it's unclear as to whether it's intra-processor ot inter-processor). Finally, there is Sleep(1) which should cover all cases (all priorities/all processors).

- Hybrid spinning.

Hybrid spinning tries to combine advantages of both approaches. That is, do active spinning for some time (to account for the case when the condition is satisfied soon by a thread running on another processor), then switch to passive spinning to not burn too much CPU cycles aimlessly.

Below is an example of how hybrid spinning may be implemented on Windows:

void do_backoff(int& backoff) // backoff is initialized to 0

{

if (backoff < 10)

_mm_pause();

else if (backoff < 20)

for (int i = 0; i != 50; i += 1) _mm_pause();

else if (backoff < 22)

SwitchToThread();

else if (backoff < 24)

Sleep(0);

else if (backoff < 26)

Sleep(1);

else

Sleep(10);

backoff += 1;

}

Note that we generally want to increase delay during successive failures to find the condition satisfied. It helps to reduce contention and/or improves overall system's efficiency.

Hybrid spinning must be considered as a default option, because it's all things to all men.

However there are some exceptions:

- On single-core/single-processor systems passive spinning should be employed.

- If a system is completely dedicated to the application (that is no other noticeable work to do), and processors are not oversubscribed with threads, then pure active spinning may be used – there is no useful to do anyway, so we can burn CPU cycles right and left. Such situation is usally the case in the context of HPC (high-performance computing). However, there is still little sense in active spinning for more than, let's say, several milliseconds.

- In ultra low-latency systems active spinning may help to reduce latecy of a critical path.

- If a really long waiting is expected than one may use pure passive spinning. No need to bother yourself with implementation of hybrid spinning.