9 Commits

Author SHA1 Message Date
Jamie Liu 4f3f5d8412 Fix sync.SeqCount.ReadOk() on non-x86 architectures.
sync.SeqCount relies on the following memory orderings:

- All stores following BeginWrite() in program order happen after the atomic
  read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this
  is implied by atomic loads being acquire-seqcst.

- All stores preceding EndWrite() in program order happen before the RMW of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores
  being release-seqcst.

- All loads following BeginRead() in program order happen after the load of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads
  being acquire-seqcst.

- All loads preceding ReadOk() in program order happen before the load of
  SeqCount.epoch. The Go 1.19 memory model does not imply this property.

The x86 memory model *does* imply this final property, and in practice the
current Go compiler does not reorder memory accesses around the load of
SeqCount.epoch, so sync.SeqCount behaves correctly on x86.
However, on ARM64, the instruction that is actually emitted for the atomic load
from SeqCount.epoch is LDAR:

```
gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo():
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34
  56371c:       f9400025        ldr     x5, [x1]
  563720:       f9400426        ldr     x6, [x1, #8]
  563724:       f9400822        ldr     x2, [x1, #16]
  563728:       f9400c23        ldr     x3, [x1, #24]
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36
  56372c:       d503201f        nop
gvisor/pkg/sync/sync.(*SeqCount).ReadOk():
gvisor/pkg/sync/seqcount.go:107
  563730:       88dffc07        ldar    w7, [x0]
  563734:       6b0400ff        cmp     w7, w4
```

LDAR is explicitly documented as not implying the required memory ordering:
https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers
Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered
architectures. To fix this, we need to introduce an explicit memory fence.

On ARM64, there is no way to implement the memory fence in question without
resorting to assembly, so the implementation is straightforward. On x86, we
introduce a compiler fence, since future compilers might otherwise reorder
memory accesses to after atomic loads; the only apparent way to do so is also
by using assembly, which unfortunately introduces overhead:

- After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the
  runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to
  ABIInternal. This is a relatively small cost.

- Before the call to sync.MemoryFenceReads(), callers spill all registers to
  the stack, since ABI0 function calls clobber all registers. The cost of this
  depends on the state of the caller before the call, and is not reflected in
  BenchmarkSeqCountReadUncontended (which does not read any protected state
  between the calls to BeginRead() and ReadOk()).

Both of these problems are caused by Go assembly functions being restricted to
ABI0. Go provides a way to mark assembly functions as using ABIInternal
instead, but restricts its use to functions in package runtime
(https://github.com/golang/go/issues/44065). runtime.publicationBarrier(),
which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler
fence on x86; defining sync.MemoryFenceReads() as an alias for that function
(using go:linkname) would mitigate the former problem, but not the latter.
Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and
have no choice but to eat the overhead.

("Fence" and "barrier" are often used interchangeably in this context; Linux
uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g.
`std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with
"write barriers", since Go is a GC'd language.)

PiperOrigin-RevId: 573861378
2023-10-16 10:50:58 -07:00
Ayush Ranjan f6ed4523dc Reformat codebase.
PiperOrigin-RevId: 449358041
2022-05-17 17:48:35 -07:00
Jamie Liu 377cfd813d Add sync.SeqCount.BeginWriteOk.
PiperOrigin-RevId: 430268914
2022-02-22 12:25:20 -08:00
Jamie Liu 8a45c81616 Allow use of SeqAtomic with pointer-containing types.
Per runtime.memmove, pointers are always copied atomically, as this is required
by the GC. (Also, the init() safety check doesn't work because it gets renamed
to <prefix>init() by template instantiation.)

PiperOrigin-RevId: 345800302
2020-12-04 19:07:17 -08:00
Jamie Liu 9bd0164237 Improve sync.SeqCount performance.
- Make sync.SeqCountEpoch not a struct. This allows sync.SeqCount.BeginRead()
  to be inlined.

- Mark sync.SeqAtomicLoad<T> nosplit to mitigate the Go compiler's refusal to
  inline it. (Best I could get was "cost 92 exceeds budget 80".)

- Use runtime-guided spinning in SeqCount.BeginRead().

Benchmarks:
name                               old time/op  new time/op   delta
pkg:pkg/sync/sync goos:linux goarch:amd64
SeqCountWriteUncontended-12        8.24ns ± 0%  11.40ns ± 0%  +38.35%  (p=0.000 n=10+10)
SeqCountReadUncontended-12         0.33ns ± 0%   0.14ns ± 3%  -57.77%  (p=0.000 n=7+8)
pkg:pkg/sync/seqatomictest/seqatomic goos:linux goarch:amd64
SeqAtomicLoadIntUncontended-12     0.64ns ± 1%   0.41ns ± 1%  -36.40%  (p=0.000 n=10+8)
SeqAtomicTryLoadIntUncontended-12  0.18ns ± 4%   0.18ns ± 1%     ~     (p=0.206 n=10+8)
AtomicValueLoadIntUncontended-12   0.27ns ± 3%   0.27ns ± 0%   -1.77%  (p=0.000 n=10+8)

(atomic.Value.Load is, of course, inlined. We would expect an uncontended
inline SeqAtomicLoad<int> to perform identically to SeqAtomicTryLoad<int>.) The
"regression" in BenchmarkSeqCountWriteUncontended, despite this CL changing
nothing in that path, is attributed to microarchitectural subtlety; the
benchmark loop is unchanged except for its address:

Before this CL:
  :0                    0x4e62d1                48ffc2                  INCQ DX
  :0                    0x4e62d4                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e62db                7e26                    JLE 0x4e6303
  :0                    0x4e62dd                90                      NOPL
  :0                    0x4e62de                bb01000000              MOVL $0x1, BX
  :0                    0x4e62e3                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62e7                ffc3                    INCL BX
  :0                    0x4e62e9                0fbae300                BTL $0x0, BX
  :0                    0x4e62ed                733a                    JAE 0x4e6329
  :0                    0x4e62ef                90                      NOPL
  :0                    0x4e62f0                bb01000000              MOVL $0x1, BX
  :0                    0x4e62f5                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62f9                ffc3                    INCL BX
  :0                    0x4e62fb                0fbae300                BTL $0x0, BX
  :0                    0x4e62ff                73d0                    JAE 0x4e62d1

After this CL:
  :0                    0x4e6361                48ffc2                  INCQ DX
  :0                    0x4e6364                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e636b                7e26                    JLE 0x4e6393
  :0                    0x4e636d                90                      NOPL
  :0                    0x4e636e                bb01000000              MOVL $0x1, BX
  :0                    0x4e6373                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6377                ffc3                    INCL BX
  :0                    0x4e6379                0fbae300                BTL $0x0, BX
  :0                    0x4e637d                733a                    JAE 0x4e63b9
  :0                    0x4e637f                90                      NOPL
  :0                    0x4e6380                bb01000000              MOVL $0x1, BX
  :0                    0x4e6385                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6389                ffc3                    INCL BX
  :0                    0x4e638b                0fbae300                BTL $0x0, BX
  :0                    0x4e638f                73d0                    JAE 0x4e6361

PiperOrigin-RevId: 329754148
2020-09-02 11:37:31 -07:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Michael Pratt fe1369ac98 Move package sync to third_party
PiperOrigin-RevId: 231889261
Change-Id: I482f1df055bcedf4edb9fe3fe9b8e9c80085f1a0
2019-01-31 17:49:14 -08:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00