468663ddbb
Former-commit-id: 1d6753294b2993e1fbf92de9366bb9544db4189b
247 lines
12 KiB
Plaintext
247 lines
12 KiB
Plaintext
Usage:
|
|
|
|
0) If possible, do this on a multiprocessor, especially if you are planning
|
|
on modifying or enhancing the package. It will work on a uniprocessor,
|
|
but the tests are much more likely to pass in the presence of serious problems.
|
|
|
|
1) Type ./configure --prefix=<install dir>; make; make check
|
|
in the directory containing unpacked source. The usual GNU build machinery
|
|
is used, except that only static, but position-independent, libraries
|
|
are normally built. On Windows, read README_win32.txt instead.
|
|
|
|
2) Applications should include atomic_ops.h. Nearly all operations
|
|
are implemented by header files included from it. It is sometimes
|
|
necessary, and always recommended to also link against libatomic_ops.a.
|
|
To use the almost non-blocking stack or malloc implementations,
|
|
see the corresponding README files, and also link against libatomic_gpl.a
|
|
before linking against libatomic_ops.a.
|
|
|
|
OVERVIEW:
|
|
Atomic_ops.h defines a large collection of operations, each one of which is
|
|
a combination of an (optional) atomic memory operation, and a memory barrier.
|
|
Also defines associated feature-test macros to determine whether a particular
|
|
operation is available on the current target hardware (either directly or
|
|
by synthesis). This is an attempt to replace various existing files with
|
|
similar goals, since they usually do not handle differences in memory
|
|
barrier styles with sufficient generality.
|
|
|
|
If this is included after defining AO_REQUIRE_CAS, then the package
|
|
will make an attempt to emulate compare-and-swap in a way that (at least
|
|
on Linux) should still be async-signal-safe. As a result, most other
|
|
atomic operations will then be defined using the compare-and-swap
|
|
emulation. This emulation is slow, since it needs to disable signals.
|
|
And it needs to block in case of contention. If you care about performance
|
|
on a platform that can't directly provide compare-and-swap, there are
|
|
probably better alternatives. But this allows easy ports to some such
|
|
platforms (e.g. PA_RISC). The option is ignored if compare-and-swap
|
|
can be implemented directly.
|
|
|
|
If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all
|
|
atomic operations will be emulated with pthread locking. This is NOT
|
|
async-signal-safe. And it is slow. It is intended primarily for debugging
|
|
of the atomic_ops package itself.
|
|
|
|
Note that the implementation reflects our understanding of real processor
|
|
behavior. This occasionally diverges from the documented behavior. (E.g.
|
|
the documented X86 behavior seems to be weak enough that it is impractical
|
|
to use. Current real implementations appear to be much better behaved.)
|
|
We of course are in no position to guarantee that future processors
|
|
(even HPs) will continue to behave this way, though we hope they will.
|
|
|
|
This is a work in progress. Corrections/additions for other platforms are
|
|
greatly appreciated. It passes rudimentary tests on X86, Itanium, and
|
|
Alpha.
|
|
|
|
OPERATIONS:
|
|
|
|
Most operations operate on values of type AO_t, which are unsigned integers
|
|
whose size matches that of pointers on the given architecture. Exceptions
|
|
are:
|
|
|
|
- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware
|
|
supports with good performance. In some cases this is the length of a cache
|
|
line. In some cases it is a byte. In many cases it is equivalent to AO_t.
|
|
|
|
- A few operations are implemented on smaller or larger size integers.
|
|
Such operations are indicated by the appropriate prefix:
|
|
|
|
AO_char_... Operates on unsigned char values.
|
|
AO_short_... Operates on unsigned short values.
|
|
AO_int_... Operates on unsigned int values.
|
|
|
|
(Currently a very limited selection of these is implemented. We're
|
|
working on it.)
|
|
|
|
The defined operations are all of the form AO_[<size>_]<op><barrier>(<args>).
|
|
|
|
The <op> component specifies an atomic memory operation. It may be
|
|
one of the following, where the corresponding argument and result types
|
|
are also specified:
|
|
|
|
void nop()
|
|
No atomic operation. The barrier may still be useful.
|
|
AO_t load(const volatile AO_t * addr)
|
|
Atomic load of *addr.
|
|
void store(volatile AO_t * addr, AO_t new_val)
|
|
Atomically store new_val to *addr.
|
|
AO_t fetch_and_add(volatile AO_t *addr, AO_t incr)
|
|
Atomically add incr to *addr, and return the original value of *addr.
|
|
AO_t fetch_and_add1(volatile AO_t *addr)
|
|
Equivalent to AO_fetch_and_add(addr, 1).
|
|
AO_t fetch_and_sub1(volatile AO_t *addr)
|
|
Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)).
|
|
void and(volatile AO_t *addr, AO_t value)
|
|
Atomically 'and' value into *addr.
|
|
void or(volatile AO_t *addr, AO_t value)
|
|
Atomically 'or' value into *addr.
|
|
void xor(volatile AO_t *addr, AO_t value)
|
|
Atomically 'xor' value into *addr.
|
|
int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
|
|
Atomically compare *addr to old_val, and replace *addr by new_val
|
|
if the first comparison succeeds. Returns nonzero if the comparison
|
|
succeeded and *addr was updated.
|
|
AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
|
|
Atomically compare *addr to old_val, and replace *addr by new_val
|
|
if the first comparison succeeds; returns the original value of *addr.
|
|
AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr)
|
|
Atomically read the binary value at *addr, and set it. AO_TS_VAL_t
|
|
is an enumeration type which includes two values AO_TS_SET and
|
|
AO_TS_CLEAR. An AO_TS_t location is capable of holding an
|
|
AO_TS_VAL_t, but may be much larger, as dictated by hardware
|
|
constraints. Test_and_set logically sets the value to AO_TS_SET.
|
|
It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro.
|
|
AO_TS_t locations should be initialized to AO_TS_INITIALIZER.
|
|
The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent.
|
|
(On PA-RISC, AO_TS_SET is zero!)
|
|
|
|
Test_and_set is a more limited version of compare_and_swap. Its only
|
|
advantage is that it is more easily implementable on some hardware. It
|
|
should thus be used if only binary test-and-set functionality is needed.
|
|
|
|
If available, we also provide compare_and_swap operations that operate
|
|
on wider values. Since standard data types for double width values
|
|
may not be available, these explicitly take pairs of arguments for the
|
|
new and/or old value. Unfortunately, there are two common variants,
|
|
neither of which can easily and efficiently emulate the other.
|
|
The first performs a comparison against the entire value being replaced,
|
|
where the second replaces a double-width replacement, but performs
|
|
a single-width comparison:
|
|
|
|
int compare_double_and_swap_double(volatile AO_double_t * addr,
|
|
AO_t old_val1, AO_t old_val2,
|
|
AO_t new_val1, AO_t new_val2);
|
|
|
|
int compare_and_swap_double(volatile AO_double_t * addr,
|
|
AO_t old_val1,
|
|
AO_t new_val1, AO_t new_val2);
|
|
|
|
where AO_double_t is a structure containing AO_val1 and AO_val2 fields,
|
|
both of type AO_t. For compare_and_swap_double, we compare against
|
|
the val1 field. AO_double_t exists only if AO_HAVE_double_t
|
|
is defined.
|
|
|
|
ORDERING CONSTRAINTS:
|
|
|
|
Each operation name also includes a suffix that specifies the associated
|
|
ordering semantics. The ordering constraint limits reordering of this
|
|
operation with respect to other atomic operations and ordinary memory
|
|
references. The current implementation assumes that all memory references
|
|
are to ordinary cacheable memory; the ordering guarantee is with respect
|
|
to other threads or processes, not I/O devices. (Whether or not this
|
|
distinction is important is platform-dependent.)
|
|
|
|
Ordering suffixes are one of the following:
|
|
|
|
<none>: No memory barrier. A plain AO_nop() really does nothing.
|
|
_release: Earlier operations must become visible to other threads
|
|
before the atomic operation.
|
|
_acquire: Later operations must become visible after this operation.
|
|
_read: Subsequent reads must become visible after reads included in
|
|
the atomic operation or preceding it. Rarely useful for clients?
|
|
_write: Earlier writes become visible before writes during or after
|
|
the atomic operation. Rarely useful for clients?
|
|
_full: Ordered with respect to both earlier and later memory ops.
|
|
AO_store_full or AO_nop_full are the normal ways to force a store
|
|
to be ordered with respect to a later load.
|
|
_release_write: Ordered with respect to earlier writes. This is
|
|
normally implemented as either a _write or _release
|
|
barrier.
|
|
_acquire_read: Ordered with respect to later reads. This is
|
|
normally implemented as either a _read or _acquire barrier.
|
|
_dd_acquire_read: Ordered with respect to later reads that are data
|
|
dependent on this one. This is needed on
|
|
a pointer read, which is later dereferenced to read a
|
|
second value, with the expectation that the second
|
|
read is ordered after the first one. On most architectures,
|
|
this is equivalent to no barrier. (This is very
|
|
hard to define precisely. It should probably be avoided.
|
|
A major problem is that optimizers tend to try to
|
|
eliminate dependencies from the generated code, since
|
|
dependencies force the hardware to execute the code
|
|
serially.)
|
|
|
|
We assume that if a store is data-dependent on a previous load, then
|
|
the two are always implicitly ordered.
|
|
|
|
It is possible to test whether AO_<op><barrier> is available on the
|
|
current platform by checking whether AO_HAVE_<op>_<barrier> is defined
|
|
as a macro.
|
|
|
|
Note that we generally don't implement operations that are either
|
|
meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to
|
|
have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write,
|
|
AO_store_read). On some platforms (e.g. PA-RISC) many operations
|
|
will remain undefined unless AO_REQUIRE_CAS is defined before including
|
|
the package.
|
|
|
|
When typed in the package build directory, the following command
|
|
will print operations that are unimplemented on the platform:
|
|
|
|
make test_atomic; ./test_atomic
|
|
|
|
The following command generates a file "list_atomic.i" containing the
|
|
macro expansions of all implemented operations on the platform:
|
|
|
|
make list_atomic.i
|
|
|
|
Future directions:
|
|
|
|
It currently appears that something roughly analogous to this is very likely
|
|
to become part of the C++0x standard. That effort has pointed out a number
|
|
of issues that we expect to address there. Since some of the solutions
|
|
really require compiler support, they may not be completely addressed here.
|
|
|
|
Known issues include:
|
|
|
|
We should be more precise in defining the semantics of the ordering
|
|
constraints, and if and how we can guarantee sequential consistency.
|
|
|
|
Dd_acquire_read is very hard or impossible to define in a way that cannot
|
|
be invalidated by reasonably standard compiler transformations.
|
|
|
|
There is probably no good reason to provide operations on standard
|
|
integer types, since those may have the wrong alignment constraints.
|
|
|
|
|
|
Example:
|
|
|
|
If you want to initialize an object, and then "publish" a pointer to it
|
|
in a global location p, such that other threads reading the new value of
|
|
p are guaranteed to see an initialized object, it suffices to use
|
|
AO_release_write(p, ...) to write the pointer to the object, and to
|
|
retrieve it in other threads with AO_acquire_read(p).
|
|
|
|
Platform notes:
|
|
|
|
All X86: We quietly assume 486 or better.
|
|
|
|
Microsoft compilers:
|
|
Define AO_ASSUME_WINDOWS98 to get access to hardware compare-and-swap
|
|
functionality. This relies on the InterlockedCompareExchange() function
|
|
which was apparently not supported in Windows95. (There may be a better
|
|
way to get access to this.)
|
|
|
|
Gcc on x86:
|
|
Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction.
|
|
Currently this is appears to be of marginal benefit.
|