What I am going to discuss in this section is trends in modern shared memory hardware and what they mean for a software developer.

What I am not going to discuss is paidness of food, power consumption and all that bla-bla-bla. If you are not yet aware of the problem, and don't know why modern processors are multicore and why we will not see significant improvements in single-threaded performance, then please refer to [free lunches are over] and [view from Berkely].


SYSTEM → NUMA Node (Memory Controller) → [L4 Cache] → Processor Package → L3 Cache (shared) → Execution Core → L2 Cache (exclusive) → L1 Cache (exclusive) → Hardware Thread


For a software developer NUMA node IS-A memory.

Libnuma, Win32 NUMA API

L4 Cache

L4 Cache is absent in many modern systems. If it's present then it's usually situated on a board that represents NUMA node and contains several processor packages. It sits between memory controller and processor packages.

The same as shared L3.

Hardware Thread

Different vendors call it differently – HyperThreading (HT), Chip Multi-threading (CMT), Simultaneous Multi-threading (SMT). In either case it refers to a situation when an execution core contains more that one hardware thread.