Files
Jamie Liu 4f3f5d8412 Fix sync.SeqCount.ReadOk() on non-x86 architectures.
sync.SeqCount relies on the following memory orderings:

- All stores following BeginWrite() in program order happen after the atomic
  read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this
  is implied by atomic loads being acquire-seqcst.

- All stores preceding EndWrite() in program order happen before the RMW of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores
  being release-seqcst.

- All loads following BeginRead() in program order happen after the load of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads
  being acquire-seqcst.

- All loads preceding ReadOk() in program order happen before the load of
  SeqCount.epoch. The Go 1.19 memory model does not imply this property.

The x86 memory model *does* imply this final property, and in practice the
current Go compiler does not reorder memory accesses around the load of
SeqCount.epoch, so sync.SeqCount behaves correctly on x86.
However, on ARM64, the instruction that is actually emitted for the atomic load
from SeqCount.epoch is LDAR:

```
gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo():
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34
  56371c:       f9400025        ldr     x5, [x1]
  563720:       f9400426        ldr     x6, [x1, #8]
  563724:       f9400822        ldr     x2, [x1, #16]
  563728:       f9400c23        ldr     x3, [x1, #24]
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36
  56372c:       d503201f        nop
gvisor/pkg/sync/sync.(*SeqCount).ReadOk():
gvisor/pkg/sync/seqcount.go:107
  563730:       88dffc07        ldar    w7, [x0]
  563734:       6b0400ff        cmp     w7, w4
```

LDAR is explicitly documented as not implying the required memory ordering:
https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers
Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered
architectures. To fix this, we need to introduce an explicit memory fence.

On ARM64, there is no way to implement the memory fence in question without
resorting to assembly, so the implementation is straightforward. On x86, we
introduce a compiler fence, since future compilers might otherwise reorder
memory accesses to after atomic loads; the only apparent way to do so is also
by using assembly, which unfortunately introduces overhead:

- After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the
  runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to
  ABIInternal. This is a relatively small cost.

- Before the call to sync.MemoryFenceReads(), callers spill all registers to
  the stack, since ABI0 function calls clobber all registers. The cost of this
  depends on the state of the caller before the call, and is not reflected in
  BenchmarkSeqCountReadUncontended (which does not read any protected state
  between the calls to BeginRead() and ReadOk()).

Both of these problems are caused by Go assembly functions being restricted to
ABI0. Go provides a way to mark assembly functions as using ABIInternal
instead, but restricts its use to functions in package runtime
(https://github.com/golang/go/issues/44065). runtime.publicationBarrier(),
which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler
fence on x86; defining sync.MemoryFenceReads() as an alias for that function
(using go:linkname) would mitigate the former problem, but not the latter.
Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and
have no choice but to eat the overhead.

("Fence" and "barrier" are often used interchangeably in this context; Linux
uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g.
`std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with
"write barriers", since Go is a GC'd language.)

PiperOrigin-RevId: 573861378
2023-10-16 10:50:58 -07:00

120 lines
4.1 KiB
Go

// Copyright 2019 The gVisor Authors.
//
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package sync
import (
"sync/atomic"
)
// SeqCount is a synchronization primitive for optimistic reader/writer
// synchronization in cases where readers can work with stale data and
// therefore do not need to block writers.
//
// Compared to sync/atomic.Value:
//
// - Mutation of SeqCount-protected data does not require memory allocation,
// whereas atomic.Value generally does. This is a significant advantage when
// writes are common.
//
// - Atomic reads of SeqCount-protected data require copying. This is a
// disadvantage when atomic reads are common.
//
// - SeqCount may be more flexible: correct use of SeqCount.ReadOk allows other
// operations to be made atomic with reads of SeqCount-protected data.
//
// - SeqCount is more cumbersome to use; atomic reads of SeqCount-protected
// data require instantiating function templates using go_generics (see
// seqatomic.go).
type SeqCount struct {
// epoch is incremented by BeginWrite and EndWrite, such that epoch is odd
// if a writer critical section is active, and a read from data protected
// by this SeqCount is atomic iff epoch is the same even value before and
// after the read.
epoch uint32
}
// SeqCountEpoch tracks writer critical sections in a SeqCount.
type SeqCountEpoch uint32
// BeginRead indicates the beginning of a reader critical section. Reader
// critical sections DO NOT BLOCK writer critical sections, so operations in a
// reader critical section MAY RACE with writer critical sections. Races are
// detected by ReadOk at the end of the reader critical section. Thus, the
// low-level structure of readers is generally:
//
// for {
// epoch := seq.BeginRead()
// // do something idempotent with seq-protected data
// if seq.ReadOk(epoch) {
// break
// }
// }
//
// However, since reader critical sections may race with writer critical
// sections, the Go race detector will (accurately) flag data races in readers
// using this pattern. Most users of SeqCount will need to use the
// SeqAtomicLoad function template in seqatomic.go.
func (s *SeqCount) BeginRead() SeqCountEpoch {
if epoch := atomic.LoadUint32(&s.epoch); epoch&1 == 0 {
return SeqCountEpoch(epoch)
}
return s.beginReadSlow()
}
func (s *SeqCount) beginReadSlow() SeqCountEpoch {
i := 0
for {
if canSpin(i) {
i++
doSpin()
} else {
goyield()
}
if epoch := atomic.LoadUint32(&s.epoch); epoch&1 == 0 {
return SeqCountEpoch(epoch)
}
}
}
// ReadOk returns true if the reader critical section initiated by a previous
// call to BeginRead() that returned epoch did not race with any writer critical
// sections.
//
// ReadOk may be called any number of times during a reader critical section.
// Reader critical sections do not need to be explicitly terminated; the last
// call to ReadOk is implicitly the end of the reader critical section.
func (s *SeqCount) ReadOk(epoch SeqCountEpoch) bool {
MemoryFenceReads()
return atomic.LoadUint32(&s.epoch) == uint32(epoch)
}
// BeginWrite indicates the beginning of a writer critical section.
//
// SeqCount does not support concurrent writer critical sections; clients with
// concurrent writers must synchronize them using e.g. sync.Mutex.
func (s *SeqCount) BeginWrite() {
if epoch := atomic.AddUint32(&s.epoch, 1); epoch&1 == 0 {
panic("SeqCount.BeginWrite during writer critical section")
}
}
// BeginWriteOk combines the semantics of ReadOk and BeginWrite. If the reader
// critical section initiated by a previous call to BeginRead() that returned
// epoch did not race with any writer critical sections, it begins a writer
// critical section and returns true. Otherwise it does nothing and returns
// false.
func (s *SeqCount) BeginWriteOk(epoch SeqCountEpoch) bool {
return atomic.CompareAndSwapUint32(&s.epoch, uint32(epoch), uint32(epoch)+1)
}
// EndWrite ends the effect of a preceding BeginWrite or successful
// BeginWriteOk.
func (s *SeqCount) EndWrite() {
if epoch := atomic.AddUint32(&s.epoch, 1); epoch&1 != 0 {
panic("SeqCount.EndWrite outside writer critical section")
}
}