Jo Shields a575963da9 Imported Upstream version 3.6.0
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
2014-08-13 10:39:27 +01:00

98 lines
5.1 KiB
Plaintext

While the Boehm GC is quite good, we need to move to a
precise, generational GC for better performance and smaller
memory usage (no false-positives memory retentions with big
allocations).
The first working implementation is committed in metadata/sgen-gc.c
as of May, 2006. This is a two-generations moving collector and it is
currently used to shake out all the issues in the runtime that need to
be fixed in order to support precise generational and moving collectors.
The two main issues are:
1) identify as precisely as possible all the pointers to managed objects
2) insert write barriers to be able to account for pointers in the old
generation pointing to objects in the newer generations
Point 1 is mostly complete. The runtime can register additional roots
with the GC as needed and it provides to the GC precise info on the
objects layout. In particular with the new precise GC it is not possible to
store GC-allocated memory in IntPtr or UIntPtr fields (in fact, the new GC
can allocate only objects and not GC-tracked untyped blobs of memory
as the Boehm GC can do). Precise info is tracked also for static fields.
What is currently missing here is:
*) precise info for ThreadStatic and ContextStatic storage (this also requires
better memory management for these sub-heaps)
*) precise info for HANDLE_NORMAL gc handles
*) precise info for thread stacks (this requires storing the info about
managed stack frames along with the jitted code for a method and doing the
stack walk for the active threads, considering conservatively the unmanaged
stack frames and precisely the managed ones. mono_jit_info_table_find () must
be made lock-free for this to work). Precise type info must be maintained
for all the local variables. Precise type info should be maintained also
for registers.
Note that this is not a correctness issue, but a performance one. The more
pointers to objects we can deal with precisely, the more effective the GC
will be, since it will be able to move the objects. The first two todo items
are mostly trivial, while handling precisely the thread stacks is complex to
implement and to test and it has a cpu and memory use runtime penalty.
In practice we need to be able to describe to the GC _all_ the memory
locations that can hold a pointer to a managed object and we must tell it also
if that location can contain:
*) a pointer to the start of an object or NULL (typically a field of an object)
*) a pinning pointer to an object (typically the result of the fixed statment in C#)
*) a pointer to the managed heap or to other locations (a typical stack location)
Since we need to provide to the GC all the locations it's not possible anymore to
store any object in unmanaged memory if it is not explicitly pinned for the entire
time the object is stored there. With the Boehm GC this was possible if the object
was kept alive in some way, but with the new GC it is not valid anymore, because
objects can move: the object will be kept alive because of the other reference, but the
pointer in unmanaged memory won't be updated to the new location where the object
has been moved.
Most of the work for inserting write barrier calls is already done as well,
but there may be still bugs in this area. In particular for it to work,
the correct IL opcodes must be used when storing an object in a field or
array element (most of the marshal.c code needs to be reviewed to use
stind.ref instead of stind.i/stind.u when needed). When this is done, the
JIT will take care of automatically inserting the write barriers.
What the JIT does automatically for managed code, must be done manually
in the runtime C code that deals with storing fields in objects and arrays
or otherwise any operation that could change a pointer in the old generation
to point to an object in the new generation. Sample cases are as follows:
*) when using C structs that map to managed objects the following macro
must be used to store an object in a field (the macro must not be used
when storing non-objects and it should not be used when storing NULL values):
MONO_OBJECT_SETREF(obj,fieldname,value)
where obj is the pointer to the object, fieldname is the name of the field in
the C struct and value is a MonoObject*. Note that obj must be a correctly
typed pointer to a struct that embeds MonoObject as the first field and
have fieldname as a field.
*) when setting the element of an array of references to an object, use the
following macro:
mono_array_setref (array,index,value)
*) when copying a number of references from an array to another:
mono_array_memcpy_refs (dest,destidx,src,srcidx,count)
*) when copying a struct that may containe reference fields, use:
void mono_value_copy (gpointer dest, gpointer src, MonoClass *klass)
*) when it is unknown if a pointer points to the stack or to the heap and an
object needs to be stored through it, use:
void mono_gc_wbarrier_generic_store (gpointer ptr, MonoObject* value)
Note that the support for write barriers in the runtime could be
used to enable also the generational features of the Boehm GC.
Some more documentation on the new GC is available at:
http://www.mono-project.com/Compacting_GC