Imported Upstream version 6.10.0.49

Former-commit-id: 1d6753294b2993e1fbf92de9366bb9544db4189b
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2020-01-16 16:38:04 +00:00
parent d94e79959b
commit 468663ddbb
48518 changed files with 2789335 additions and 61176 deletions

View File

@@ -0,0 +1,63 @@
Our intent is to make it easy to use libatomic_ops, in
both free and proprietary software. Hence most code that we expect to be
linked into a client application is covered by an MIT-style license.
A few library routines are covered by the GNU General Public License.
These are put into a separate library, libatomic_ops_gpl.a .
The low-level part of the library is mostly covered by the following
license:
----------------------------------------
Copyright (c) ...
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
--------------------------------
A few files in the sysdeps directory were inherited in part from the
Boehm-Demers-Weiser conservative garbage collector, and are covered by
its license, which is similar in spirit:
--------------------------------
Copyright (c) ...
THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
OR IMPLIED. ANY USE IS AT YOUR OWN RISK.
Permission is hereby granted to use or copy this program
for any purpose, provided the above notices are retained on all copies.
Permission to modify the code and to distribute modified code is granted,
provided the above notices are retained, and a notice that the code was
modified is included with the above copyright notice.
----------------------------------
A few files are covered by the GNU General Public License. (See file
"COPYING".) This applies only to test code, sample applications,
and the libatomic_ops_gpl portion of the library.
Thus libatomic_ops_gpl should generally not be linked into proprietary code.
(This distinction was motivated by patent considerations.)
It is possible that the license of the GPL pieces may be changed for
future versions to make them more consistent with the rest of the package.
If you submit patches, and have strong preferences about licensing, please
express them.

View File

@@ -0,0 +1,4 @@
# installed documentation
#
dist_pkgdata_DATA=LICENSING.txt README.txt README_stack.txt \
README_malloc.txt README_win32.txt

View File

@@ -0,0 +1,246 @@
Usage:
0) If possible, do this on a multiprocessor, especially if you are planning
on modifying or enhancing the package. It will work on a uniprocessor,
but the tests are much more likely to pass in the presence of serious problems.
1) Type ./configure --prefix=<install dir>; make; make check
in the directory containing unpacked source. The usual GNU build machinery
is used, except that only static, but position-independent, libraries
are normally built. On Windows, read README_win32.txt instead.
2) Applications should include atomic_ops.h. Nearly all operations
are implemented by header files included from it. It is sometimes
necessary, and always recommended to also link against libatomic_ops.a.
To use the almost non-blocking stack or malloc implementations,
see the corresponding README files, and also link against libatomic_gpl.a
before linking against libatomic_ops.a.
OVERVIEW:
Atomic_ops.h defines a large collection of operations, each one of which is
a combination of an (optional) atomic memory operation, and a memory barrier.
Also defines associated feature-test macros to determine whether a particular
operation is available on the current target hardware (either directly or
by synthesis). This is an attempt to replace various existing files with
similar goals, since they usually do not handle differences in memory
barrier styles with sufficient generality.
If this is included after defining AO_REQUIRE_CAS, then the package
will make an attempt to emulate compare-and-swap in a way that (at least
on Linux) should still be async-signal-safe. As a result, most other
atomic operations will then be defined using the compare-and-swap
emulation. This emulation is slow, since it needs to disable signals.
And it needs to block in case of contention. If you care about performance
on a platform that can't directly provide compare-and-swap, there are
probably better alternatives. But this allows easy ports to some such
platforms (e.g. PA_RISC). The option is ignored if compare-and-swap
can be implemented directly.
If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all
atomic operations will be emulated with pthread locking. This is NOT
async-signal-safe. And it is slow. It is intended primarily for debugging
of the atomic_ops package itself.
Note that the implementation reflects our understanding of real processor
behavior. This occasionally diverges from the documented behavior. (E.g.
the documented X86 behavior seems to be weak enough that it is impractical
to use. Current real implementations appear to be much better behaved.)
We of course are in no position to guarantee that future processors
(even HPs) will continue to behave this way, though we hope they will.
This is a work in progress. Corrections/additions for other platforms are
greatly appreciated. It passes rudimentary tests on X86, Itanium, and
Alpha.
OPERATIONS:
Most operations operate on values of type AO_t, which are unsigned integers
whose size matches that of pointers on the given architecture. Exceptions
are:
- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware
supports with good performance. In some cases this is the length of a cache
line. In some cases it is a byte. In many cases it is equivalent to AO_t.
- A few operations are implemented on smaller or larger size integers.
Such operations are indicated by the appropriate prefix:
AO_char_... Operates on unsigned char values.
AO_short_... Operates on unsigned short values.
AO_int_... Operates on unsigned int values.
(Currently a very limited selection of these is implemented. We're
working on it.)
The defined operations are all of the form AO_[<size>_]<op><barrier>(<args>).
The <op> component specifies an atomic memory operation. It may be
one of the following, where the corresponding argument and result types
are also specified:
void nop()
No atomic operation. The barrier may still be useful.
AO_t load(const volatile AO_t * addr)
Atomic load of *addr.
void store(volatile AO_t * addr, AO_t new_val)
Atomically store new_val to *addr.
AO_t fetch_and_add(volatile AO_t *addr, AO_t incr)
Atomically add incr to *addr, and return the original value of *addr.
AO_t fetch_and_add1(volatile AO_t *addr)
Equivalent to AO_fetch_and_add(addr, 1).
AO_t fetch_and_sub1(volatile AO_t *addr)
Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)).
void and(volatile AO_t *addr, AO_t value)
Atomically 'and' value into *addr.
void or(volatile AO_t *addr, AO_t value)
Atomically 'or' value into *addr.
void xor(volatile AO_t *addr, AO_t value)
Atomically 'xor' value into *addr.
int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
Atomically compare *addr to old_val, and replace *addr by new_val
if the first comparison succeeds. Returns nonzero if the comparison
succeeded and *addr was updated.
AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
Atomically compare *addr to old_val, and replace *addr by new_val
if the first comparison succeeds; returns the original value of *addr.
AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr)
Atomically read the binary value at *addr, and set it. AO_TS_VAL_t
is an enumeration type which includes two values AO_TS_SET and
AO_TS_CLEAR. An AO_TS_t location is capable of holding an
AO_TS_VAL_t, but may be much larger, as dictated by hardware
constraints. Test_and_set logically sets the value to AO_TS_SET.
It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro.
AO_TS_t locations should be initialized to AO_TS_INITIALIZER.
The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent.
(On PA-RISC, AO_TS_SET is zero!)
Test_and_set is a more limited version of compare_and_swap. Its only
advantage is that it is more easily implementable on some hardware. It
should thus be used if only binary test-and-set functionality is needed.
If available, we also provide compare_and_swap operations that operate
on wider values. Since standard data types for double width values
may not be available, these explicitly take pairs of arguments for the
new and/or old value. Unfortunately, there are two common variants,
neither of which can easily and efficiently emulate the other.
The first performs a comparison against the entire value being replaced,
where the second replaces a double-width replacement, but performs
a single-width comparison:
int compare_double_and_swap_double(volatile AO_double_t * addr,
AO_t old_val1, AO_t old_val2,
AO_t new_val1, AO_t new_val2);
int compare_and_swap_double(volatile AO_double_t * addr,
AO_t old_val1,
AO_t new_val1, AO_t new_val2);
where AO_double_t is a structure containing AO_val1 and AO_val2 fields,
both of type AO_t. For compare_and_swap_double, we compare against
the val1 field. AO_double_t exists only if AO_HAVE_double_t
is defined.
ORDERING CONSTRAINTS:
Each operation name also includes a suffix that specifies the associated
ordering semantics. The ordering constraint limits reordering of this
operation with respect to other atomic operations and ordinary memory
references. The current implementation assumes that all memory references
are to ordinary cacheable memory; the ordering guarantee is with respect
to other threads or processes, not I/O devices. (Whether or not this
distinction is important is platform-dependent.)
Ordering suffixes are one of the following:
<none>: No memory barrier. A plain AO_nop() really does nothing.
_release: Earlier operations must become visible to other threads
before the atomic operation.
_acquire: Later operations must become visible after this operation.
_read: Subsequent reads must become visible after reads included in
the atomic operation or preceding it. Rarely useful for clients?
_write: Earlier writes become visible before writes during or after
the atomic operation. Rarely useful for clients?
_full: Ordered with respect to both earlier and later memory ops.
AO_store_full or AO_nop_full are the normal ways to force a store
to be ordered with respect to a later load.
_release_write: Ordered with respect to earlier writes. This is
normally implemented as either a _write or _release
barrier.
_acquire_read: Ordered with respect to later reads. This is
normally implemented as either a _read or _acquire barrier.
_dd_acquire_read: Ordered with respect to later reads that are data
dependent on this one. This is needed on
a pointer read, which is later dereferenced to read a
second value, with the expectation that the second
read is ordered after the first one. On most architectures,
this is equivalent to no barrier. (This is very
hard to define precisely. It should probably be avoided.
A major problem is that optimizers tend to try to
eliminate dependencies from the generated code, since
dependencies force the hardware to execute the code
serially.)
We assume that if a store is data-dependent on a previous load, then
the two are always implicitly ordered.
It is possible to test whether AO_<op><barrier> is available on the
current platform by checking whether AO_HAVE_<op>_<barrier> is defined
as a macro.
Note that we generally don't implement operations that are either
meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to
have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write,
AO_store_read). On some platforms (e.g. PA-RISC) many operations
will remain undefined unless AO_REQUIRE_CAS is defined before including
the package.
When typed in the package build directory, the following command
will print operations that are unimplemented on the platform:
make test_atomic; ./test_atomic
The following command generates a file "list_atomic.i" containing the
macro expansions of all implemented operations on the platform:
make list_atomic.i
Future directions:
It currently appears that something roughly analogous to this is very likely
to become part of the C++0x standard. That effort has pointed out a number
of issues that we expect to address there. Since some of the solutions
really require compiler support, they may not be completely addressed here.
Known issues include:
We should be more precise in defining the semantics of the ordering
constraints, and if and how we can guarantee sequential consistency.
Dd_acquire_read is very hard or impossible to define in a way that cannot
be invalidated by reasonably standard compiler transformations.
There is probably no good reason to provide operations on standard
integer types, since those may have the wrong alignment constraints.
Example:
If you want to initialize an object, and then "publish" a pointer to it
in a global location p, such that other threads reading the new value of
p are guaranteed to see an initialized object, it suffices to use
AO_release_write(p, ...) to write the pointer to the object, and to
retrieve it in other threads with AO_acquire_read(p).
Platform notes:
All X86: We quietly assume 486 or better.
Microsoft compilers:
Define AO_ASSUME_WINDOWS98 to get access to hardware compare-and-swap
functionality. This relies on the InterlockedCompareExchange() function
which was apparently not supported in Windows95. (There may be a better
way to get access to this.)
Gcc on x86:
Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction.
Currently this is appears to be of marginal benefit.

View File

@@ -0,0 +1,57 @@
The libatomic_ops_gpl includes a simple almost-lock-free malloc implementation.
This is intended as a safe way to allocate memory from a signal handler,
or to allocate memory in the context of a library that does not know what
thread library it will be used with. In either case locking is impossible.
Note that the operations are only guaranteed to be 1-lock-free, i.e. a
single blocked thread will not prevent progress, but multiple blocked
threads may. To safely use these operations in a signal handler,
the handler should be non-reentrant, i.e. it should not be interruptable
by another handler using these operations. Furthermore use outside
of signal handlers in a multithreaded application should be protected
by a lock, so that at most one invocation may be interrupted by a signal.
The header will define the macro "AO_MALLOC_IS_LOCK_FREE" on platforms
on which malloc is completely lock-free, and hence these restrictions
do not apply.
In the presence of threads, but absence of contention, the time performance
of this package should be as good, or slightly better than, most system
malloc implementations. Its space performance
is theoretically optimal (to within a constant factor), but probably
quite poor in practice. In particular, no attempt is made to
coalesce free small memory blocks. Something like Doug Lea's malloc is
likely to use significantly less memory for complex applications.
Performance on platforms without an efficient compare-and-swap implementation
will be poor.
This package was not designed for processor-scalability in the face of
high allocation rates. If all threads happen to allocate different-sized
objects, you might get lucky. Otherwise expect contention and false-sharing
problems. If this is an issue, something like Maged Michael's algorithm
(PLDI 2004) would be technically a far better choice. If you are concerned
only with scalability, and not signal-safety, you might also consider
using Hoard instead. We have seen a factor of 3 to 4 slowdown from the
standard glibc malloc implementation with contention, even when the
performance without contention was faster. (To make the implementation
more scalable, one would need to replicate at least the free list headers,
so that concurrent access is possible without cache conflicts.)
Unfortunately there is no portable async-signal-safe way to obtain large
chunks of memory from the OS. Based on reading of the source code,
mmap-based allocation appears safe under Linux, and probably BSD variants.
It is probably unsafe for operating systems built on Mach, such as
Apple's Darwin. Without use of mmap, the allocator is
limited to a fixed size, statically preallocated heap (2MB by default),
and will fail to allocate objects above a certain size (just under 64K
by default). Use of mmap to circumvent these limitations requires an
explicit call.
The entire interface to the AO_malloc package currently consists of:
#include <atomic_ops_malloc.h> /* includes atomic_ops.h */
void *AO_malloc(size_t sz);
void AO_free(void *p);
void AO_malloc_enable_mmap(void);

View File

@@ -0,0 +1,77 @@
Note that the AO_stack implementation is licensed under the GPL,
unlike the lower level routines.
The header file atomic_ops_stack.h defines a linked stack abstraction.
Stacks may be accessed by multiple concurrent threads. The implementation
is 1-lock-free, i.e. it will continue to make progress if at most one
thread becomes inactive while operating on the data structure.
(The implementation can be built to be N-lock-free for any given N. But that
seems to rarely be useful, especially since larger N involve some slowdown.)
This makes it safe to access these data structures from non-reentrant
signal handlers, provided at most one non-signal-handler thread is
accessing the data structure at once. This latter condition can be
ensured by acquiring an ordinary lock around the non-handler accesses
to the data structure.
For details see:
Hans-J. Boehm, "An Almost Non-Blocking Stack", PODC 2004,
http://portal.acm.org/citation.cfm?doid=1011767.1011774
(This is not exactly the implementation described there, since the
interface was cleaned up in the interim. But it should perform
very similarly.)
We use a fully lock-free implementation when the underlying hardware
makes that less expensive, i.e. when we have a double-wide compare-and-swap
operation available. (The fully lock-free implementation uses an AO_t-
sized version count, and assumes it does not wrap during the time any
given operation is active. This seems reasonably safe on 32-bit hardware,
and very safe on 64-bit hardware.) If a fully lock-free implementation
is used, the macro AO_STACK_IS_LOCK_FREE will be defined.
The implementation is interesting only because it allows reuse of
existing nodes. This is necessary, for example, to implement a memory
allocator.
Since we want to leave the precise stack node type up to the client,
we insist only that each stack node contains a link field of type AO_t.
When a new node is pushed on the stack, the push operation expects to be
passed the pointer to this link field, which will then be overwritten by
this link field. Similarly, the pop operation returns a pointer to the
link field of the object that previously was on the top of the stack.
The cleanest way to use these routines is probably to define the stack node
type with an initial AO_t link field, so that the conversion between the
link-field pointer and the stack element pointer is just a compile-time
cast. But other possibilities exist. (This would be cleaner in C++ with
templates.)
A stack is represented by an AO_stack_t structure. (This is normally
2 or 3 times the size of a pointer.) It may be statically initialized
by setting it to AO_STACK_INITIALIZER, or dynamically initialized to
an empty stack with AO_stack_init. There are only three operations for
accessing stacks:
void AO_stack_init(AO_stack_t *list);
void AO_stack_push_release(AO_stack_t *list, AO_t *new_element);
AO_t * AO_stack_pop_acquire(volatile AO_stack_t *list);
We require that the objects pushed as list elements remain addressable
as long as any push or pop operation are in progress. (It is OK for an object
to be "pop"ped off a stack and "deallocated" with a concurrent "pop" on
the same stack still in progress, but only if "deallocation" leaves the
object addressable. The second "pop" may still read the object, but
the value it reads will not matter.)
We require that the headers (AO_stack objects) remain allocated and
valid as long as any operations on them are still in-flight.
We also provide macros AO_REAL_HEAD_PTR that converts an AO_stack_t
to a pointer to the link field in the next element, and AO_REAL_NEXT_PTR
that converts a link field to a real, dereferencable, pointer to the link field
in the next element. This is intended only for debugging, or to traverse
the list after modification has ceased. There is otherwise no guarantee that
walking a stack using this macro will produce any kind of consistent
picture of the data structure.

View File

@@ -0,0 +1,31 @@
Most of the atomic_ops functionality is available under Win32 with
the Microsoft tools, but the build process currently is considerably more
primitive than on Linux/Unix platforms.
To build:
1) Go to the src directory in the distribution.
2) Make sure the Microsoft command-line tools (e.g. nmake) are available.
3) Run "nmake -f Makefile.msft". This should run some tests, which
may print warnings about the types of the "Interlocked" functions.
I haven't been able to make all versions of VC++ happy. If you know
how to, please send a patch.
4) To compile applications, you will need to retain or copy the following
pieces from the resulting src directory contents:
"atomic_ops.h" - Header file defining low-level primitives. This
includes files from:
"atomic_ops"- Subdirectory containing implementation header files.
"atomic_ops_stack.h" - Header file describing almost lock-free stack.
"atomic_ops_malloc.h" - Header file describing almost lock-free malloc.
"libatomic_ops_gpl.lib" - Library containing implementation of the
above two (plus AO_pause() defined in atomic_ops.c).
The atomic_ops.h implementation is entirely in the
header files in Win32.
Most clients of atomic_ops.h will need to define AO_ASSUME_WINDOWS98 before
including it. Compare_and_swap is otherwise not available.
Defining AO_ASSUME_VISTA will make compare_double_and_swap_double available
as well.
Note that the library is covered by the GNU General Public License, while
the top 2 of these pieces allow use in proprietary code.