mirror of
https://github.com/encounter/bdwgc.git
synced 2026-03-30 10:57:55 -07:00
d6982ea99a
* doc/finalization.html: New file (copy file from BDWGC site). * Makefile.direct (DOC_FILES): Add "doc/finalization.html" entry. * doc.am (dist_pkgdata_DATA): Likewise. * doc/gcdescr.html: Convert absolute references to finalization.html file to local ones. * doc/gcinterface.html: Likewise. * doc/overview.html: Likewise.
191 lines
9.3 KiB
HTML
191 lines
9.3 KiB
HTML
<!DOCTYPE HTML>
|
|
<HEAD>
|
|
<TITLE>Finalization in the Boehm-Demers-Weiser collector </TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<H1>Finalization</h1>
|
|
Many garbage collectors provide a facility for executing user code
|
|
just before an object is collected. This can be used to reclaim any
|
|
system resources or non-garbage-collected memory associated with the
|
|
object.
|
|
Experience has shown that this can be a useful facility.
|
|
It is indispensable in cases in which system resources are embedded
|
|
in complex data structures (<I>e.g.</i> file descriptors
|
|
in the <A type="text/plain" HREF="../include/cord.h">cord package</a>).
|
|
<P>
|
|
Our collector provides the necessary functionality through
|
|
<TT>GC_register_finalizer</tt> in
|
|
<A type="text/plain" HREF="../include/gc.h">gc.h</a>, or by
|
|
inheriting from <TT>gc_cleanup</tt>
|
|
in <A type="text/plain" HREF="../include/gc_cpp.h">gc_cpp.h</a>.
|
|
<P>
|
|
However, finalization should not be used in the same way as C++
|
|
destructors. In well-written programs there will typically be
|
|
very few uses of finalization. (Garbage collected programs that
|
|
interact with explicitly memory-managed libraries may be an exception.)
|
|
<P>
|
|
In general the following guidelines should be followed:
|
|
<UL>
|
|
<LI>
|
|
Actions that must be executed promptly do not belong in finalizers.
|
|
They should be handled by explicit calls in the code (or C++
|
|
destructors if you prefer). If you expect the action to occur at
|
|
a specific point, this is probably not hard.
|
|
<LI>
|
|
Finalizers are intended for resource reclamation.
|
|
<LI>
|
|
Scarce system resources should be managed explicitly whenever
|
|
convenient. Use finalizers only as a backup mechanism for the
|
|
cases that would be hard to handle explicitly.
|
|
<LI>
|
|
If scarce resources are managed with finalization, the allocation
|
|
routine for that resource (<I>e.g.</i> open for file handles) should force
|
|
a garbage collection (two if that doesn't suffice) if it finds itself
|
|
short of the resource.
|
|
<LI>
|
|
If extremely scarce resources are managed by finalization (<I>e.g.</i>
|
|
file descriptors on systems which have a limit of 20 open files),
|
|
it may be necessary to introduce a descriptor caching scheme to
|
|
hide the resource limit.
|
|
(<I>E.g.</i>, the program would keep real file descriptors
|
|
for the 20 most recently used logically open files.
|
|
Any other needed files would be closed after saving their state.
|
|
They would then be reopened on demand.
|
|
Finalization would logically close the file, closing the
|
|
real descriptor only if it happened to be cached.)
|
|
Note that most modern systems (<I>e.g.</i> Irix®) allow hundreds or
|
|
thousands of open files, and this is typically not an issue.
|
|
<LI>
|
|
Finalization code may
|
|
be run anyplace an allocation or other call to the collector
|
|
takes place.
|
|
In multi-threaded programs, finalizers have to obey the normal
|
|
locking conventions to ensure safety.
|
|
Code run directly from finalizers should not acquire locks that may
|
|
be held during allocation. This restriction can be easily circumvented
|
|
by registering a finalizer which enqueues the real action for execution
|
|
in a separate thread.
|
|
<P>
|
|
In single-threaded code, it is also often easiest to have finalizers
|
|
queue actions, which are then explicitly run during an
|
|
explicit call by the user's program.
|
|
</ul>
|
|
<H1>Topologically Ordered Finalization</h1>
|
|
Our <A HREF="overview.html">conservative garbage collector</a> supports
|
|
a form of finalization
|
|
(with <TT>GC_register_finalizer</tt>)
|
|
in which objects are finalized in topological
|
|
order. If <I>A</i> points to <I>B</i>, and both are registered for
|
|
finalization, it is guaranteed the <I>A</i> will be finalized first.
|
|
This usually guarantees that finalization procedures see only
|
|
unfinalized objects.
|
|
<P>
|
|
This decision is often questioned, particularly since it has an obvious
|
|
disadvantage. The current implementation finalizes long chains of
|
|
finalizable objects one per collection. This is hard to avoid, since
|
|
the first finalizer invoked may store a pointer to the rest of the chain
|
|
in a global variable, making it accessible again. Or it may mutate the
|
|
rest of the chain.
|
|
<P>
|
|
Cycles involving one or more finalizable objects are never finalized.
|
|
<H1>
|
|
Why topological ordering?
|
|
</h1>
|
|
It is important to keep in mind that the choice of finalization ordering
|
|
matters only in relatively rare cases. In spite of the fact that it has
|
|
received a lot of discussion, it is not one of the more important
|
|
decisions in designing a system. Many, especially smaller, applications
|
|
will never notice the difference. Nonetheless, we believe that topologically
|
|
ordered finalization is the right choice.
|
|
<P>
|
|
To understand the justification, observe that if <I>A</i>s
|
|
finalization procedure does not refer to <I>B</i>, we could fairly easily have
|
|
avoided the dependency. We could have split <I>A</i> into <I>A'</i>
|
|
and <I>A''</i> such that any references to <I>A</i> become references to
|
|
<I>A'</i>, <I>A'</i> points to <I>A''</i> but not vice-versa, only fields
|
|
needed for finalization are stored in <I>A''</i>, and <I>A''</i> is enabled
|
|
for finalization. (<TT>GC_register_disappearing_link</tt> provides an
|
|
alternative mechanism that does not require breaking up objects.)
|
|
<P>
|
|
Thus assume that <I>A</i> actually does need access to <I>B</i> during
|
|
finalization. To make things concrete, assume that <I>B</i> is
|
|
finalizable because it holds a pointer to a C object, which must be
|
|
explicitly deallocated. (This is likely to be one of the most common
|
|
uses of finalization.) If <I>B</i> happens to be finalized first,
|
|
<I>A</i> will see a dangling pointer during its finalization. But a
|
|
principal goal of garbage collection was to avoid dangling pointers.
|
|
<P>
|
|
Note that the client program could enforce topological ordering
|
|
even if the system didn't. A pointer to <I>B</i> could be stored in
|
|
some globally visible place, where it is cleared only by <I>A</i>s
|
|
finalizer. But this puts the burden to ensure safety back on the
|
|
programmer.
|
|
<P>
|
|
With topologically ordered finalization, the programmer
|
|
can fail to split an object, thus leaving an accidental cycle. This
|
|
results in a leak, which is arguably less dangerous than a dangling
|
|
pointer. More importantly, it is <I>much</i> easier to diagnose,
|
|
since the garbage collector would have to go out of its way not to
|
|
notice finalization cycles. It can trivially report them.
|
|
<P>
|
|
Furthermore unordered finalization does not really solve the problem
|
|
of cycles. Consider the above case in which <I>A</i>s
|
|
finalization procedure depends on <I>B</i>, and thus a pointer to <I>B</i>
|
|
is stored in a global data structure, to be cleared by <I>A</i>s finalizer.
|
|
If there is an accidental pointer from <I>B</i> back to <I>A</i>, and
|
|
thus a cycle, neither <I>B</i> nor <I>A</i> will become unreachable.
|
|
The leak is there, just as in the topologically ordered case, but it is
|
|
hidden from easy diagnosis.
|
|
<P>
|
|
A number of alternative finalization orderings have been proposed, e.g.
|
|
based on statically assigned priorities. In our opinion, these are much
|
|
more likely to require complex programming discipline to use in a large
|
|
modular system. (Some of them, e.g. Guardians proposed by Dybvig,
|
|
Bruggeman, and Eby, do avoid some problems which arise in combination
|
|
with certain other collection algorithms.)
|
|
<P>
|
|
Fundamentally, a garbage collector assumes that objects reachable
|
|
via pointer chains may be accessed, and thus should be preserved.
|
|
Topologically ordered finalization simply extends this to object finalization;
|
|
an finalizable object reachable from another finalizer via a pointer chain
|
|
is presumed to be accessible by the finalizer, and thus should not be
|
|
finalized.
|
|
|
|
<H1>Programming with topological finalization</h1>
|
|
Experience with Cedar has shown that cycles or long chains of finalizable
|
|
objects are typically not a problem.
|
|
Finalizable objects are typically rare.
|
|
There are several ways to reduce spurious dependencies between finalizable
|
|
objects. Splitting objects as discussed above is one technique.
|
|
The collector also provides <TT>GC_register_disappearing_link</tt>, which
|
|
explicitly nils a pointer before determining finalization ordering.
|
|
<P>
|
|
Some so-called "operating systems" fail to clean up some resources associated
|
|
with a process. These resources must be deallocated at all cost before
|
|
process exit whether or not they are still referenced. Probably the best
|
|
way to deal with those is by not relying exclusively on finalization.
|
|
They should be registered in a table of weak pointers (implemented as
|
|
disguised pointers cleared by the finalization procedure that deallocates
|
|
the resource). If any references are still left at process exit, they
|
|
can be explicitly deallocated then.
|
|
|
|
<H1>Getting around topological finalization ordering</h1>
|
|
There are certain situations in which cycles between finalizable objects are
|
|
genuinely unavoidable. Most notably, C++ compilers introduce self-cycles
|
|
to represent inheritance. <TT>GC_register_finalizer_ignore_self</tt> tells the
|
|
finalization part of the collector to ignore self cycles.
|
|
This is used by the C++ interface.
|
|
<P>
|
|
Finalize.c actually contains an intentionally undocumented mechanism
|
|
for registering a finalizable object with user-defined dependencies.
|
|
The problem is that this dependency information is also used for memory
|
|
reclamation, not just finalization ordering. Thus misuse can result in
|
|
dangling pointers even if finalization doesn't create any.
|
|
The risk of dangling pointers can be eliminated by building the collector
|
|
with -DJAVA_FINALIZATION. This forces objects reachable from finalizers
|
|
to be marked, even though this dependency is not considered for finalization
|
|
ordering.
|
|
|
|
</body>
|
|
</html>
|