sync.SeqCount relies on the following memory orderings:
- All stores following BeginWrite() in program order happen after the atomic
read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this
is implied by atomic loads being acquire-seqcst.
- All stores preceding EndWrite() in program order happen before the RMW of
SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores
being release-seqcst.
- All loads following BeginRead() in program order happen after the load of
SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads
being acquire-seqcst.
- All loads preceding ReadOk() in program order happen before the load of
SeqCount.epoch. The Go 1.19 memory model does not imply this property.
The x86 memory model *does* imply this final property, and in practice the
current Go compiler does not reorder memory accesses around the load of
SeqCount.epoch, so sync.SeqCount behaves correctly on x86.
However, on ARM64, the instruction that is actually emitted for the atomic load
from SeqCount.epoch is LDAR:
```
gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo():
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34
56371c: f9400025 ldr x5, [x1]
563720: f9400426 ldr x6, [x1, #8]
563724: f9400822 ldr x2, [x1, #16]
563728: f9400c23 ldr x3, [x1, #24]
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36
56372c: d503201f nop
gvisor/pkg/sync/sync.(*SeqCount).ReadOk():
gvisor/pkg/sync/seqcount.go:107
563730: 88dffc07 ldar w7, [x0]
563734: 6b0400ff cmp w7, w4
```
LDAR is explicitly documented as not implying the required memory ordering:
https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers
Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered
architectures. To fix this, we need to introduce an explicit memory fence.
On ARM64, there is no way to implement the memory fence in question without
resorting to assembly, so the implementation is straightforward. On x86, we
introduce a compiler fence, since future compilers might otherwise reorder
memory accesses to after atomic loads; the only apparent way to do so is also
by using assembly, which unfortunately introduces overhead:
- After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the
runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to
ABIInternal. This is a relatively small cost.
- Before the call to sync.MemoryFenceReads(), callers spill all registers to
the stack, since ABI0 function calls clobber all registers. The cost of this
depends on the state of the caller before the call, and is not reflected in
BenchmarkSeqCountReadUncontended (which does not read any protected state
between the calls to BeginRead() and ReadOk()).
Both of these problems are caused by Go assembly functions being restricted to
ABI0. Go provides a way to mark assembly functions as using ABIInternal
instead, but restricts its use to functions in package runtime
(https://github.com/golang/go/issues/44065). runtime.publicationBarrier(),
which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler
fence on x86; defining sync.MemoryFenceReads() as an alias for that function
(using go:linkname) would mitigate the former problem, but not the latter.
Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and
have no choice but to eat the overhead.
("Fence" and "barrier" are often used interchangeably in this context; Linux
uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g.
`std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with
"write barriers", since Go is a GC'd language.)
PiperOrigin-RevId: 573861378
Per runtime.memmove, pointers are always copied atomically, as this is required
by the GC. (Also, the init() safety check doesn't work because it gets renamed
to <prefix>init() by template instantiation.)
PiperOrigin-RevId: 345800302
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387