An important moment regarding memory fences is that it's always a game of two. It's about mutual ordering of memory accesses as how they are perceived by other threads, and thus there is always "this" thread and "that" thread.
Below is a short crash-course in memory ordering on an example of the most fundamental synchronization pattern related to data transfer between 2 threads:
So, we want to ensure that thread 2 either prints nothing or prints "37". Thus we need to prevent the following reordering in thread 1:
and the following reordering in thread 2:
It's easy to conclude that either of the above reorderings will lead to printing of a random number.
First, we need to answer 2 questions:
1. What are synchronization actions (visibility of effects of which is ensured by cache-coherency)?
In our example synchronization actions are store and load of 'flag'. In other words, 'flag' is a synchronization variable.
2. What are associated data (visibility of which is ensured by correct ordering of operations)?
In our example associated data is variable 'data'.
Generally we want to ensure correct ordering of accesses to associated data with respect to synchronization actions. We can do it with both types of memory fences, let's start with bi-directional fences:
And the same with access-tied fences:
Let's illustrate it with an illustration:
So, generally, visibility of "primary" synchronization actions is ensured by cache-coherency in a best-effort manner, that is, they just become visible in some future point in time. While visibility of "secondary" associated data is ensured by correct ordering of accesses to the data with respect to "primary" synchronization actions.
Another good introduction into memory ordering is Memory Ordering in Modern Microprocessors. Part 1 and Part 2 (by Paul E. McKenney).
Move on to Compiler vs. Hardware