Hardware
Hardware
What I am going to discuss in this section is trends in modern shared memory hardware and what they mean for a software developer.
What I am not going to discuss is paidness of food, power consumption and all that bla-bla-bla. If you are not yet aware of the problem, and don't know why modern processors are multicore and why we will not see significant improvements in single-threaded performance, then please refer to [free lunches are over] and [view from Berkely].
[GPU]
SYSTEM → NUMA Node (Memory Controller) → [L4 Cache] → Processor Package → L3 Cache (shared) → Execution Core → L2 Cache (exclusive) → L1 Cache (exclusive) → Hardware Thread
NUMA Node
For a software developer NUMA node IS-A memory.
Libnuma, Win32 NUMA API
L4 Cache
L4 Cache is absent in many modern systems. If it's present then it's usually situated on a board that represents NUMA node and contains several processor packages. It sits between memory controller and processor packages.
The same as shared L3.
Hardware Thread
Different vendors call it differently – HyperThreading (HT), Chip Multi-threading (CMT), Simultaneous Multi-threading (SMT). In either case it refers to a situation when an execution core contains more that one hardware thread.