You've already forked linux-packaging-mono
Imported Upstream version 4.2.0.179
Former-commit-id: 0a113cb3a6feb7873f632839b1307cc6033cd595
This commit is contained in:
committed by
Jo Shields
parent
183bba2c9a
commit
6992685b86
@@ -14,7 +14,6 @@ ASSEMBLED_DOCS = \
|
||||
EXTRA_DIST = \
|
||||
abc-removal.txt \
|
||||
api-style.css \
|
||||
assembly-bundle \
|
||||
check-exports \
|
||||
check-coverage \
|
||||
convert.cs \
|
||||
@@ -23,7 +22,6 @@ EXTRA_DIST = \
|
||||
docs.make \
|
||||
documented \
|
||||
embedded-api \
|
||||
exceptions \
|
||||
exdoc \
|
||||
file-share-modes \
|
||||
gc-issues \
|
||||
@@ -35,33 +33,25 @@ EXTRA_DIST = \
|
||||
jit-imt \
|
||||
jit-thoughts \
|
||||
jit-trampolines \
|
||||
local-regalloc.txt \
|
||||
magic.diff \
|
||||
mini-doc.txt \
|
||||
mono-api-metadata.html \
|
||||
mono-file-formats.config\
|
||||
mono-file-formats.source\
|
||||
mono_handle_d \
|
||||
mono-tools.config \
|
||||
mono-tools.source \
|
||||
monoapi.source \
|
||||
new-regalloc \
|
||||
object-layout \
|
||||
opcode-decomp.txt \
|
||||
precise-gc \
|
||||
produce-lists \
|
||||
public \
|
||||
public-api \
|
||||
README \
|
||||
release-notes-1.0.html \
|
||||
remoting \
|
||||
ssapre.txt \
|
||||
stack-alignment \
|
||||
stack-overflow.txt \
|
||||
threading \
|
||||
toc.xml \
|
||||
TODO \
|
||||
tree-mover.txt \
|
||||
unmanaged-calls
|
||||
|
||||
dist-hook:
|
||||
|
||||
@@ -370,7 +370,6 @@ ASSEMBLED_DOCS = \
|
||||
EXTRA_DIST = \
|
||||
abc-removal.txt \
|
||||
api-style.css \
|
||||
assembly-bundle \
|
||||
check-exports \
|
||||
check-coverage \
|
||||
convert.cs \
|
||||
@@ -379,7 +378,6 @@ EXTRA_DIST = \
|
||||
docs.make \
|
||||
documented \
|
||||
embedded-api \
|
||||
exceptions \
|
||||
exdoc \
|
||||
file-share-modes \
|
||||
gc-issues \
|
||||
@@ -391,33 +389,25 @@ EXTRA_DIST = \
|
||||
jit-imt \
|
||||
jit-thoughts \
|
||||
jit-trampolines \
|
||||
local-regalloc.txt \
|
||||
magic.diff \
|
||||
mini-doc.txt \
|
||||
mono-api-metadata.html \
|
||||
mono-file-formats.config\
|
||||
mono-file-formats.source\
|
||||
mono_handle_d \
|
||||
mono-tools.config \
|
||||
mono-tools.source \
|
||||
monoapi.source \
|
||||
new-regalloc \
|
||||
object-layout \
|
||||
opcode-decomp.txt \
|
||||
precise-gc \
|
||||
produce-lists \
|
||||
public \
|
||||
public-api \
|
||||
README \
|
||||
release-notes-1.0.html \
|
||||
remoting \
|
||||
ssapre.txt \
|
||||
stack-alignment \
|
||||
stack-overflow.txt \
|
||||
threading \
|
||||
toc.xml \
|
||||
TODO \
|
||||
tree-mover.txt \
|
||||
unmanaged-calls
|
||||
|
||||
TOOL_MAKE = $(MAKE) -f $(srcdir)/docs.make topdir=$(srcdir)/../mcs srcdir=$(srcdir)
|
||||
|
||||
@@ -1,57 +0,0 @@
|
||||
|
||||
HOWTO bundle assemblies inside the mono runtime.
|
||||
Paolo Molaro (lupus@ximian.com)
|
||||
|
||||
* Intent
|
||||
|
||||
Bundling assemblies inside the mono runtime may be useful for a number
|
||||
of reasons:
|
||||
|
||||
* creating a standalone complete runtime that can be more easily
|
||||
distributed
|
||||
|
||||
* having an application run against a known set of assemblies
|
||||
that has been tested
|
||||
|
||||
Of course, there are drawbacks, too: if there has been fixes
|
||||
to the assemblies, replacing them means recompiling the
|
||||
runtime as well and if there are other mono apps, unless they
|
||||
use the same mono binary, there will be less opportunities for
|
||||
the operating system to optimize memory usage. So use this
|
||||
feature only when really needed.
|
||||
|
||||
* Creating the Bundle
|
||||
|
||||
To bundle a set of assemblies, you need to create a file that
|
||||
lists the assembly names and the relative files. Empty lines
|
||||
and lines starting with # are ignored:
|
||||
|
||||
== cut cut ==
|
||||
# Sample bundle template
|
||||
mscorlib: /path/to/mscorlib/assembly.dll
|
||||
myapp: /path/to/myapp.exe
|
||||
== cut cut ==
|
||||
|
||||
Next you need to build the mono runtime using a special configure option:
|
||||
|
||||
./configure --with-bundle=/path/to/bundle/template
|
||||
|
||||
The path to the template should be an absolute path.
|
||||
|
||||
The script metadata/make-bundle.pl will take the specifie
|
||||
assemblies and embed them inside the runtime where the loading
|
||||
routines can find them before searching for them on disk.
|
||||
|
||||
* Open Issues
|
||||
|
||||
There are still two issues to solve:
|
||||
|
||||
* config files: sometimes they are needed but they are
|
||||
not yet bundled inside the library ()
|
||||
|
||||
* building with the included libgc makes it not
|
||||
possible to build a mono binary statically linked to
|
||||
libmono: this needs to be fixed to make bundles
|
||||
really useful.
|
||||
|
||||
|
||||
@@ -292,7 +292,6 @@ mono_free_method
|
||||
mono_free_verify_list
|
||||
mono_gc_collect
|
||||
mono_gc_collection_count
|
||||
mono_gc_enable_events
|
||||
mono_gc_get_generation
|
||||
mono_gc_get_heap_size
|
||||
mono_gc_get_used_size
|
||||
@@ -301,9 +300,7 @@ mono_gchandle_get_target
|
||||
mono_gchandle_new
|
||||
mono_gchandle_new_weakref
|
||||
mono_gc_invoke_finalizers
|
||||
mono_gc_is_finalizer_thread
|
||||
mono_gc_max_generation
|
||||
mono_gc_out_of_memory
|
||||
mono_gc_wbarrier_arrayref_copy
|
||||
mono_gc_wbarrier_generic_nostore
|
||||
mono_gc_wbarrier_generic_store
|
||||
@@ -604,7 +601,6 @@ mono_object_get_class
|
||||
mono_object_get_domain
|
||||
mono_object_get_size
|
||||
mono_object_get_virtual_method
|
||||
mono_object_is_alive
|
||||
mono_object_isinst
|
||||
mono_object_isinst_mbyref
|
||||
mono_object_new
|
||||
|
||||
@@ -476,33 +476,6 @@ mono_loader_lock (void)
|
||||
<div class="prototype">Prototype: mono_gc_enable</div>
|
||||
<p />
|
||||
|
||||
</div> <a name="api:mono_gc_is_finalizer_thread"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_gc_is_finalizer_thread</div>
|
||||
|
||||
<div class="prototype">gboolean
|
||||
mono_gc_is_finalizer_thread (MonoThread *thread)
|
||||
|
||||
</div>
|
||||
<p />
|
||||
<b>Parameters</b>
|
||||
<blockquote><dt><i>thread:</i></dt><dd> the thread to test.</dd></blockquote>
|
||||
<b>Remarks</b>
|
||||
<p />
|
||||
In Mono objects are finalized asynchronously on a separate thread.
|
||||
This routine tests whether the <i>thread</i> argument represents the
|
||||
finalization thread.
|
||||
|
||||
<p />
|
||||
Returns true if <i>thread</i> is the finalization thread.
|
||||
|
||||
</div> <a name="api:mono_gc_out_of_memory"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_gc_out_of_memory</div>
|
||||
|
||||
<div class="prototype">Prototype: mono_gc_out_of_memory</div>
|
||||
<p />
|
||||
|
||||
</div> <a name="api:mono_gc_start_world"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_gc_start_world</div>
|
||||
@@ -524,13 +497,6 @@ mono_gc_is_finalizer_thread (MonoThread *thread)
|
||||
<div class="prototype">Prototype: mono_gc_alloc_fixed</div>
|
||||
<p />
|
||||
|
||||
</div> <a name="api:mono_gc_enable_events"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_gc_enable_events</div>
|
||||
|
||||
<div class="prototype">Prototype: mono_gc_enable_events</div>
|
||||
<p />
|
||||
|
||||
</div> <a name="api:mono_gc_free_fixed"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_gc_free_fixed</div>
|
||||
|
||||
@@ -111,7 +111,7 @@ mono_print_method_from_ip (void *ip)
|
||||
|
||||
This prints the name of the method at address <i>ip</i> in the standard
|
||||
output. Unlike mono_pmip which returns a string, this routine
|
||||
prints the value on the standard output.
|
||||
prints the value on the standard output.
|
||||
|
||||
</div> <a name="api:mono_print_thread_dump"></a>
|
||||
<div class="api">
|
||||
|
||||
@@ -108,7 +108,6 @@ MonoObject* <a href="#api:mono_object_isinst">mono_object_isinst</a>
|
||||
gpointer <a href="#api:mono_object_unbox">mono_object_unbox</a> (MonoObject *obj);
|
||||
MonoObject* <a href="#api:mono_object_castclass_mbyref">mono_object_castclass_mbyref</a> (MonoObject *obj,
|
||||
MonoClass *klass);
|
||||
<a href="#api:mono_object_is_alive"></a>
|
||||
guint <a href="#api:mono_object_get_size">mono_object_get_size</a> (MonoObject* o);
|
||||
MonoObject* <a href="#api:mono_value_box">mono_value_box</a> (MonoDomain *domain,
|
||||
MonoClass *class,
|
||||
@@ -423,13 +422,6 @@ mono_object_castclass_mbyref (MonoObject *obj, MonoClass *klass)
|
||||
<blockquote> <i>obj</i> if <i>obj</i> is derived from <i>klass</i>, throws an exception otherwise
|
||||
</blockquote>
|
||||
|
||||
</div> <a name="api:mono_object_is_alive"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_object_is_alive</div>
|
||||
|
||||
<div class="prototype">Prototype: mono_object_is_alive</div>
|
||||
<p />
|
||||
|
||||
</div> <a name="api:mono_object_get_size"></a>
|
||||
<div class="api">
|
||||
<div class="api-entry">mono_object_get_size</div>
|
||||
|
||||
@@ -269,12 +269,9 @@ mono_gc_weak_link_get
|
||||
mono_gc_weak_link_remove
|
||||
mono_gc_disable
|
||||
mono_gc_enable
|
||||
mono_gc_is_finalizer_thread
|
||||
mono_gc_out_of_memory
|
||||
mono_gc_start_world
|
||||
mono_gc_stop_world
|
||||
mono_gc_alloc_fixed
|
||||
mono_gc_enable_events
|
||||
mono_gc_free_fixed
|
||||
mono_gc_make_descr_from_bitmap
|
||||
mono_gc_base_init
|
||||
@@ -526,7 +523,6 @@ mono_object_isinst
|
||||
mono_object_register_finalizer
|
||||
mono_object_unbox
|
||||
mono_object_castclass_mbyref
|
||||
mono_object_is_alive
|
||||
mono_object_get_size
|
||||
mono_value_box
|
||||
mono_value_copy
|
||||
|
||||
110
docs/exceptions
110
docs/exceptions
@@ -1,110 +0,0 @@
|
||||
Exception Implementation in the Mono Runtime
|
||||
Dietmar Maurer (dietmar@ximian.com)
|
||||
(C) 2001 Ximian, Inc.
|
||||
|
||||
Exception implementation (jit):
|
||||
===============================
|
||||
|
||||
Stack unwinding:
|
||||
================
|
||||
|
||||
We record the code address (start_address, size) of all methods. That way it is
|
||||
possible to map an instruction pointer (IP) to the method information needed
|
||||
for unwinding the stack:
|
||||
|
||||
We also save a Last Managed Frame (LMF) structure at each call from managed to
|
||||
unmanaged code. That way we can recover from exceptions inside unmanaged code.
|
||||
|
||||
void handle_exception ((struct sigcontext *ctx, gpointer obj)
|
||||
{
|
||||
if (ctx->bp < mono_end_of_stack) {
|
||||
/* unhandled exception */
|
||||
abort ();
|
||||
}
|
||||
|
||||
info = mono_jit_info_table_find (mono_jit_info_table, ctx->ip);
|
||||
|
||||
if (info) { // we are inside managed code
|
||||
|
||||
if (ch = find_catch_handler ())
|
||||
execute_catch_handler (ch, ctx, obj);
|
||||
|
||||
execute_all_finally_handler ();
|
||||
|
||||
// restore register, including IP and Frame pointer
|
||||
ctx = restore_caller_saved_registers_from_ctx (ji, ctx);
|
||||
|
||||
// continue unwinding
|
||||
handle_exception (ctx, obj);
|
||||
|
||||
} else {
|
||||
|
||||
lmf = get_last_managed_frame ();
|
||||
|
||||
// restore register, including IP and Frame pointer
|
||||
ctx = restore_caller_saved_registers_from_lmf (ji, lmf);
|
||||
|
||||
// continue unwinding
|
||||
handle_exception (ctx, obj);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Code generation:
|
||||
================
|
||||
|
||||
leave: is simply translated into a branch to the target. If the leave
|
||||
instruction is inside a finally block (but not inside another handler)
|
||||
we call the finally handler before we branch to the target.
|
||||
|
||||
finally/endfinally, filter/endfilter: is translated into subroutine ending with
|
||||
a "return" statement. The subroutine does not save EBP, because we need access
|
||||
to the local variables of the enclosing method. Its is possible that
|
||||
instructions inside those handlers modify the stack pointer, thus we save the
|
||||
stack pointer at the start of the handler, and restore it at the end. We have
|
||||
to use a "call" instruction to execute such finally handlers. This makes it
|
||||
also possible to execute them inside the stack unwinding code. The exception
|
||||
object for filters is passed in a local variable (cfg->exvar).
|
||||
|
||||
throw: we first save all regs into a sigcontext struct and then call the stack
|
||||
unwinding code.
|
||||
|
||||
catch handler: catch hanlders are always called from the stack unwinding
|
||||
code. The exception object is passed in a local variable (cfg->exvar).
|
||||
|
||||
gcc support for Exceptions
|
||||
==========================
|
||||
|
||||
gcc supports exceptions in files compiled with the -fexception option. gcc
|
||||
generates DWARF exceptions tables in that case, so it is possible to unwind the
|
||||
stack. The method to read those exception tables is contained in libgcc.a, and
|
||||
in newer versions of glibc (glibc 2.2.5 for example), and it is called
|
||||
__frame_state_for(). Another usable glibc function is backtrace_symbols() which
|
||||
returns the function name corresponding to a code address.
|
||||
|
||||
We dynamically check if those features are available using g_module_symbol(),
|
||||
and we use them only when available. If not available we use the LMF as
|
||||
fallback.
|
||||
|
||||
Using gcc exception information prevents us from saving the LMF at each native
|
||||
call, so this is a way to speed up native calls. This is especially valuable
|
||||
for internal calls, because we can make sure that all internal calls are
|
||||
compiled with -fexceptions (we compile the whole mono runtime with that
|
||||
option).
|
||||
|
||||
All native function are able to call function without exception tables, and so
|
||||
we are unable to restore all caller saved registers if an exception is raised
|
||||
in such function. Well, its possible if the previous function already saves all
|
||||
registers. So we only omit the the LMF if a function has an exception table
|
||||
able to restore all caller saved registers.
|
||||
|
||||
One problem is that gcc almost never saves all caller saved registers, because
|
||||
it is just unnecessary in normal situations. But there is a trick forcing gcc
|
||||
to save all register, we just need to call __builtin_unwind_init() at the
|
||||
beginning of a function. That way gcc generates code to save all caller saved
|
||||
register on the stack.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,208 +0,0 @@
|
||||
|
||||
* Proposal for the local register allocator
|
||||
|
||||
The local register allocator deals with allocating registers
|
||||
for temporaries inside a single basic block, while the global
|
||||
register allocator is concerned with method-wide allocation of
|
||||
variables.
|
||||
The global register allocator uses callee-saved register for it's
|
||||
purpouse so that there is no need to save and restore these registers
|
||||
at call sites.
|
||||
|
||||
There are a number of issues the local allocator needs to deal with:
|
||||
*) some instructions expect operands in specific registers (for example
|
||||
the shl instruction on x86, or the call instruction with thiscall
|
||||
convention, or the equivalent call instructions on other architectures,
|
||||
such as the need to put output registers in %oX on sparc)
|
||||
*) some instructions deliver results only in specific registers (for example
|
||||
the div instruction on x86, or the call instructionson on almost all
|
||||
the architectures).
|
||||
*) it needs to know what registers may be clobbered by an instruction
|
||||
(such as in a method call)
|
||||
*) it should avoid excessive reloads or stores to improve performance
|
||||
|
||||
While which specific instructions have limitations is architecture-dependent,
|
||||
the problem shold be solved in an arch-independent way to reduce code duplication.
|
||||
The register allocator will be 'driven' by the arch-dependent code, but it's
|
||||
implementation should be arch-independent.
|
||||
|
||||
To improve the current local register allocator, we need to
|
||||
keep more state in it than the current setup that only keeps busy/free info.
|
||||
|
||||
Possible state information is:
|
||||
|
||||
free: the resgister is free to use and it doesn't contain useful info
|
||||
freeable: the register contains data loaded from a local (there is
|
||||
also info about _which_ local it contains) as a result from previous
|
||||
instructions (like, there was a store from the register to the local)
|
||||
moveable: it contains live data that is needed in a following instruction, but
|
||||
the contents may be moved to a different register
|
||||
busy: the register contains live data and it is placed there because
|
||||
the following instructions need it exactly in that register
|
||||
allocated: the register is used by the global allocator
|
||||
|
||||
The local register allocator will have the following interfaces:
|
||||
|
||||
int get_register ();
|
||||
Searches for a register in the free state. If it doesn't find it,
|
||||
searches for a freeable register. Sets the status to moveable.
|
||||
Looking for a 'free' register before a freeable one should allow for
|
||||
removing a few redundant loads (though I'm still unsure if such
|
||||
things should be delegated entirely to the peephole pass).
|
||||
|
||||
int get_register_force (int reg);
|
||||
Returns 'reg' if it is free or freeable. If it is moveable, it moves it
|
||||
to another free or freeable register.
|
||||
Sets the status of 'reg' to busy.
|
||||
|
||||
void set_register_freeable (int reg);
|
||||
Sets the status of 'reg' to freeable.
|
||||
|
||||
void set_register_free (int reg);
|
||||
Sets the status of 'reg' to free.
|
||||
|
||||
void will_clobber (int reg);
|
||||
Spills the register to the stack. Sets the status to freeable.
|
||||
After the clobbering has occurred, set the status to free.
|
||||
|
||||
void register_unspill (int reg);
|
||||
Un-spills register reg and sets the status to moveable.
|
||||
|
||||
FIXME: how is the 'local' information represented? Maybe a MonoInst* pointer.
|
||||
|
||||
Note: the register allocator will insert instructions in the basic block
|
||||
during it's operation.
|
||||
|
||||
* Examples
|
||||
|
||||
Given the tree (on x86 the right argument to shl needs to be in ecx):
|
||||
|
||||
store (local1, shl (local1, call (some_arg)))
|
||||
|
||||
At the start of the basic block, the registers are set to the free state.
|
||||
The sequence of instructions may be:
|
||||
instruction register status -> [%eax %ecx %edx]
|
||||
start free free free
|
||||
eax = load local1 mov free free
|
||||
/* call clobbers eax, ecx, edx */
|
||||
spill eax free free free
|
||||
call mov free free
|
||||
/* now eax contains the right operand of the shl */
|
||||
mov %eax -> %ecx free busy free
|
||||
un-spill mov busy free
|
||||
shl %cl, %eax mov free free
|
||||
|
||||
The resulting x86 code is:
|
||||
mov $fffc(%ebp), %eax
|
||||
mov %eax, $fff0(%ebp)
|
||||
push some_arg
|
||||
call func
|
||||
mov %eax, %ecx
|
||||
mov $fff0(%ebp), %eax
|
||||
shl %cl, %eax
|
||||
|
||||
Note that since shl could operate directly on memory, we could have:
|
||||
|
||||
push some_arg
|
||||
call func
|
||||
mov %eax, %ecx
|
||||
shl %cl, $fffc(%ebp)
|
||||
|
||||
The above example with loading the operand in a register is just to complicate
|
||||
the example and show that the algorithm should be able to handle it.
|
||||
|
||||
Let's take another example with the this-call call convention (the first argument
|
||||
is passed in %ecx).
|
||||
In this case, will_clobber() will be called only on %eax and %edx, while %ecx
|
||||
will be allocated with get_register_force ().
|
||||
Note: when a register is allocated with get_register_force(), it should be set
|
||||
to a different state as soon as possible.
|
||||
|
||||
store (local1, shl (local1, this-call (local1)))
|
||||
|
||||
instruction register status -> [%eax %ecx %edx]
|
||||
start free free free
|
||||
eax = load local1 mov free free
|
||||
/* force load in %ecx */
|
||||
ecx = load local1 mov busy free
|
||||
spill eax free busy free
|
||||
call mov free free
|
||||
/* now eax contains the right operand of the shl */
|
||||
mov %eax -> %ecx free busy free
|
||||
un-spill mov busy free
|
||||
shl %cl, %eax mov free free
|
||||
|
||||
What happens when a register that we need to allocate with get_register_force ()
|
||||
contains an operand for the next instruction?
|
||||
|
||||
instruction register status -> [%eax %ecx %edx]
|
||||
eax = load local0 mov free free
|
||||
ecx = load local1 mov mov free
|
||||
get_register_force (ecx) here.
|
||||
We have two options:
|
||||
mov %ecx, %edx
|
||||
or:
|
||||
spill %ecx
|
||||
The first option is way better (and allows the peephole pass to
|
||||
just load the value in %edx directly, instead of loading first to %ecx).
|
||||
This doesn't work, though, if the instruction clobbers the %edx register
|
||||
(like in a this-call). So, we first need to clobber the registers
|
||||
(so the state of %ecx changes to freebale and there is no issue
|
||||
with get_register_force ()).
|
||||
What if an instruction both clobbers a register and requires it as
|
||||
an operand? Lets' take the x86 idiv instruction as an example: it
|
||||
requires the dividend in edx:eax and returns the result in eax,
|
||||
with the modulus in edx.
|
||||
|
||||
store (local1, div (local1, local2))
|
||||
|
||||
instruction register status -> [%eax %ecx %edx]
|
||||
eax = load local0 mov free free
|
||||
will_clobber eax, edx free mov free
|
||||
force mov %ecx, %eax busy free free
|
||||
set %edx busy free busy
|
||||
idiv mov free free
|
||||
|
||||
Note: edx is set to free after idiv, because the modulus is not needed
|
||||
(if it was a rem, eax would have been freed).
|
||||
If we load the divisor before will_clobber(), we'll have to spill
|
||||
eax and reload it later. If we load it just after the idiv, there is no issue.
|
||||
In any case, the algorithm should give the correct results and allow the operation.
|
||||
|
||||
Working recursively on the isntructions there shouldn't be huge issues
|
||||
with this algorithm (though, of course, it's not optimal and it may
|
||||
introduce excessive spills or register moves). The advantage over the current
|
||||
local reg allocator is that:
|
||||
1) the number of spills/moves would be smaller anyway
|
||||
2) a separate peephole pass could be able to eliminate reg moves
|
||||
3) we'll be able to remove the 'forced' spills we currently do with
|
||||
the return value of method calls
|
||||
|
||||
* Issues
|
||||
|
||||
How to best integrate such a reg allocator with the burg stuff.
|
||||
|
||||
Think about a call os sparc with two arguments: they got into %o0 and %o1
|
||||
and each of them sets the register as busy. But what if the values to put there
|
||||
are themselves the result of a call? %o0 is no problem, but for all the
|
||||
next argument n the above algorithm would spill all the 0...n-1 registers...
|
||||
|
||||
* Papers
|
||||
|
||||
More complex solutions to the local register allocator problem:
|
||||
http://dimacs.rutgers.edu/TechnicalReports/abstracts/1997/97-33.html
|
||||
|
||||
Combining register allocation and instruction scheduling:
|
||||
http://citeseer.nj.nec.com/motwani95combining.html
|
||||
|
||||
More on LRA euristics:
|
||||
http://citeseer.nj.nec.com/liberatore97hardness.html
|
||||
|
||||
Linear-time optimal code scheduling for delayedload architectures
|
||||
http://www.cs.wisc.edu/~fischer/cs701.f01/inst.sched.ps.gz
|
||||
|
||||
Precise Register Allocation for Irregular Architectures
|
||||
http://citeseer.nj.nec.com/kong98precise.html
|
||||
|
||||
Allocate registers first to subtrees that need more of them.
|
||||
http://www.upb.de/cs/ag-kastens/compii/folien/comment401-409.2.pdf
|
||||
@@ -1,15 +0,0 @@
|
||||
This is a patch that can be applied to the magic file used by file(1) to
|
||||
recognize mono assemblies.
|
||||
Apply it to the magic file (usually in /usr/share/file/magic or
|
||||
/usr/share/misc/magic) and recompile it with file -C.
|
||||
|
||||
--- magic.old 2006-03-24 21:12:25.000000000 +0100
|
||||
+++ magic 2006-03-24 21:12:17.000000000 +0100
|
||||
@@ -7205,6 +7205,7 @@
|
||||
>>>>(0x3c.l+4) leshort 0x290 PA-RISC
|
||||
>>>>(0x3c.l+22) leshort&0x0100 >0 32-bit
|
||||
>>>>(0x3c.l+22) leshort&0x1000 >0 system file
|
||||
+>>>>(0x3c.l+232) lelong >0 Mono/.Net assembly
|
||||
|
||||
>>>>(0x3c.l+0xf8) string UPX0 \b, UPX compressed
|
||||
>>>>(0x3c.l+0xf8) search/0x140 PEC2 \b, PECompact2 compressed
|
||||
@@ -1,98 +0,0 @@
|
||||
=pod
|
||||
|
||||
=head1 Internal design document for the mono_handle_d
|
||||
|
||||
This document is designed to hold the design of the mono_handle_d and
|
||||
not as an api reference.
|
||||
|
||||
=head2 Primary goal and purpose
|
||||
|
||||
The mono_handle_d is a process which takes care of the (de)allocation
|
||||
of scratch shared memory and handles (of files, threads, mutexes,
|
||||
sockets etc. see L<WapiHandleType>) and refcounts of the
|
||||
filehandles. It is designed to be run by a user and to be fast, thus
|
||||
minimal error checking on input is done and will most likely crash if
|
||||
given a faulty package. No effort has been, or should be, made to have
|
||||
the daemon talking to machine of different endianness/size of int.
|
||||
|
||||
=head2 How to start the daemon
|
||||
|
||||
To start the daemon you either run the mono_handle_d executable or try
|
||||
to attach to the shared memory segment via L<_wapi_shm_attach> which
|
||||
will start a daemon if one does not exist.
|
||||
|
||||
=head1 Internal details
|
||||
|
||||
The daemon works by opening a socket and listening to clients. These
|
||||
clients send packages over the socket complying to L<struct
|
||||
WapiHandleRequest>.
|
||||
|
||||
=head2 Possible requests
|
||||
|
||||
=over
|
||||
|
||||
=item WapiHandleRequest_New
|
||||
|
||||
Find a handle in the shared memory segment that is free and allocate
|
||||
it to the specified type. To destroy use
|
||||
L</WapiHandleRequest_Close>. A L<WapiHandleResponse> with
|
||||
.type=WapiHandleResponseType_New will be sent back with .u.new.handle
|
||||
set to the handle that was allocated. .u.new.type is the type that was
|
||||
requested.
|
||||
|
||||
=item WapiHandleRequestType_Open
|
||||
|
||||
Increase the ref count of an already created handle. A
|
||||
L<WapiHandleResponse> with .type=WapiHandleResponseType_Open will be sent
|
||||
back with .u.new.handle set to the handle, .u.new.type is set to the
|
||||
type of handle this is.
|
||||
|
||||
=item WapiHandleRequestType_Close
|
||||
|
||||
Decrease the ref count of an already created handle. A
|
||||
L<WapiHandleResponse> with .type=WapiHandleResponseType_Close will be
|
||||
sent back with .u.close.destroy set to TRUE if ref count for this
|
||||
client reached 0.
|
||||
|
||||
=item WapiHandleRequestType_Scratch
|
||||
|
||||
Allocate a shared memory area of size .u.scratch.length in bytes. A
|
||||
L<WapiHandleResponse> with .type=WapiHandleResponseType_Scratch will be
|
||||
sent back with .u.scratch.idx set to the index into the shared
|
||||
memory's scratch area where to memory begins. (works just like
|
||||
malloc(3))
|
||||
|
||||
=item WapiHandleRequestType_Scratch
|
||||
|
||||
Deallocate a shared memory area, this must have been allocated before
|
||||
deallocating. A L<WapiHandleResponse> with
|
||||
.type=WapiHandleResponseType_ScratchFree will be sent back (works just
|
||||
like free(3))
|
||||
|
||||
=back
|
||||
|
||||
=head1 Why a daemon
|
||||
|
||||
From an email:
|
||||
|
||||
Dennis: I just have one question about the daemon... Why does it
|
||||
exist? Isn't it better performancewise to just protect the shared area
|
||||
with a mutex when allocation a new handle/shared mem segment or
|
||||
changing refcnt? It will however be a less resilient to clients that
|
||||
crash (the deamon cleans up ref'd handles if socket closes)
|
||||
|
||||
Dick: It's precisely because with a mutex the shared memory segment
|
||||
can be left in a locked state. Also, it's not so easy to clean up
|
||||
shared memory without it (you can't just mark it deleted when creating
|
||||
it, because you can't attach any more readers to the same segment
|
||||
after that). I did some minimal performance testing, and I don't
|
||||
think the daemon is particularly slow.
|
||||
|
||||
|
||||
=head1 Authors
|
||||
|
||||
Documentaion: Dennis Haney
|
||||
|
||||
Implementation: Dick Porter
|
||||
|
||||
=cut
|
||||
@@ -1,68 +0,0 @@
|
||||
We need to switch to a new register allocator.
|
||||
The current one is split in a global and a local register allocator.
|
||||
The global one can assign only callee-saves registers and happens
|
||||
on the tree-based internal representation: it assigns local variables
|
||||
to hardware registers.
|
||||
The local one happens on the linear representation on a per basic
|
||||
block basis and assigns hard registers to virtual registers (which
|
||||
hold temporary values during expression executions) and it deals also
|
||||
with the platform-specific issues (fixed registers, call conventions).
|
||||
|
||||
Moving to a different register will help solve some of the performance
|
||||
issues introduced by the above split, make the register more easily
|
||||
portable and solve some of the issues generated by dealing with trees.
|
||||
|
||||
The general design ideas are below.
|
||||
|
||||
The new allocator should have a global view of all the method, so it can be
|
||||
able to assign variables also to some of the volatile registers if possible,
|
||||
even across basic blocks (this would improve performance).
|
||||
|
||||
The allocator would be driven by per-arch declarative data, so porting
|
||||
should be easier: an architecture needs to specify register classes,
|
||||
call convention and instructions requirements (similar to the gcc code).
|
||||
|
||||
The allocator should operate on the linear representation, this way it's
|
||||
easier and faster to track usages more correctly. We need to assign virtual
|
||||
registers on a per-method basis instead of per basic block. We can assign
|
||||
virtual registers to variables, too. Note that since we fix the stack offset
|
||||
of local vars only after this step (which happens after the burg rules are run),
|
||||
some of the burg rules that try to optimize the code won't apply anymore:
|
||||
the peephole code may need to be enhanced to do the optimizations instead.
|
||||
|
||||
We need to handle floating point registers in the global allocator, too.
|
||||
|
||||
The new allocator also needs to keep track precisely of which registers
|
||||
contain references or managed pointers to allow us to move to a precise GC.
|
||||
|
||||
It may be worth to use a single increasing set of integers for the virtual
|
||||
registers, with the class of the register stored separately (unless the
|
||||
current local allocator which keeps interger and fp registers separate).
|
||||
|
||||
Since this is a large task, we need to do it in steps as much as possible.
|
||||
The first is to run the register allocator _after_ the burg rules: this
|
||||
requires a rewrite of the liveness code, too, to use linear indexes instead
|
||||
of basic-block/tree number combinations. This can be done by:
|
||||
*) allocating virtual regs to all the locals that can be register allocated
|
||||
*) running the burg rules (some may require adjustments): the local virtual
|
||||
registers are assigned starting from global-virt-regs+1, instead of the current
|
||||
hardware-regs+1, so we can tell apart global and local virt regs.
|
||||
*) running the liveness/whatever code is needed to allocate the global registers
|
||||
*) allocate the rest of the local variables to stack slots
|
||||
*) continue with the current local allocator
|
||||
|
||||
This work could take 2-3 weeks.
|
||||
|
||||
The next step is to define the kind of declarative data an architecture needs
|
||||
and assigning virtual regs to all the registers and making the allocator
|
||||
assign from the volatile registers, too.
|
||||
Note that some of the code that is currently emitted in the arch-specific
|
||||
code, will need to be emitted as instructions that the reg allocator
|
||||
can inspect: think of a method that returns the first argument which is
|
||||
received in a register: the current code copies it to either a local slot or
|
||||
to a global reg in the prolog an copies it back to the return register
|
||||
int he basic block, but since neither the regallocator nor the peephole code
|
||||
knows about the prolog code, the first store cannot be optimized away.
|
||||
The gcc code has some example of how to specify register classes in a
|
||||
declarative way.
|
||||
|
||||
@@ -1,113 +0,0 @@
|
||||
|
||||
* How to handle complex IL opcodes in an arch-independent way
|
||||
|
||||
Many IL opcodes are very simple: add, ldind etc.
|
||||
Such opcodes can be implemented with a single cpu instruction
|
||||
in most architectures (on some, a group of IL instructions
|
||||
can be converted to a single cpu op).
|
||||
There are many IL opcodes, though, that are more complex, but
|
||||
can be expressed as a series of trees or a single tree of
|
||||
simple operations. Such simple operations are architecture-independent.
|
||||
It makes sense to decompose such complex IL instructions in their
|
||||
simpler equivalent so that we gain in several ways:
|
||||
*) porting effort is easier, because only the simple instructions
|
||||
need to be implemented in arch-specific code
|
||||
*) we could apply BURG rules to the trees and do pattern matching
|
||||
on them to optimize the expressions according to the host cpu
|
||||
|
||||
The issue is: where do we do such conversion from coarse opcodes to
|
||||
simple expressions?
|
||||
|
||||
* Doing the conversion in method_to_ir ()
|
||||
|
||||
Some of these conversions can certainly be done in method_to_ir (),
|
||||
but it's not always easy to decide which are better done there and
|
||||
which in a different pass.
|
||||
For example, let's take ldlen: in the mono implementation, ldlen
|
||||
can be simply implemented with a load from a fixed position in the
|
||||
array object:
|
||||
|
||||
len = [reg + maxlen_offset]
|
||||
|
||||
However, ldlen carries also semantics information: the result is the
|
||||
length of the array, and since in the CLR arrays are of fixed size,
|
||||
this information can be useful to later do bounds check removal.
|
||||
If we convert this opcode in method_to_ir () we lost some useful
|
||||
information for further optimizations.
|
||||
|
||||
In some other ways, decomposing an opcode in method_to_ir() may
|
||||
allow for better optimizations later on (need to come up with an
|
||||
example here ...).
|
||||
|
||||
* Doing the conversion in inssel.brg
|
||||
|
||||
Some conversion may be done inside the burg rules: this has the
|
||||
disadvantage that the instruction selector is not run again on
|
||||
the resulting expression tree and we could miss some optimization
|
||||
(this is what effectively happens with the coarse opcodes in the old
|
||||
jit). This may also interfere with an efficient local register allocator.
|
||||
It may be possible to add an extension in monoburg that allows a rule
|
||||
such as:
|
||||
|
||||
recheck: LDLEN (reg) {
|
||||
create an expression tree representing LDLEN
|
||||
and return it
|
||||
}
|
||||
|
||||
When the monoburg label process gets back a recheck, it will run
|
||||
the labeling again on the resulting expression tree.
|
||||
If this is possible at all (and in an efficient way) is a
|
||||
question for dietmar:-)
|
||||
It should be noted, though, that this may not always work, since
|
||||
some complex IL opcodes may require a series of expression trees
|
||||
and handling such cases in monoburg could become quite hairy.
|
||||
For example, think of opcode that need to do multiple actions on the
|
||||
same object: this basically means a DUP...
|
||||
On the other end, if a complex opcode needs a DUP, monoburg doesn't
|
||||
actually need to create trees if it emits the instructions in
|
||||
the correct sequence and maintains the right values in the registers
|
||||
(usually the values that need a DUP are not changed...). How
|
||||
this integrates with the current register allocator is not clear, since
|
||||
that assigns registers based on the rule, but the instructions emitted
|
||||
by the rules may be different (this already happens with the current JIT
|
||||
where a MULT is replaced with lea etc...).
|
||||
|
||||
* Doing it in a separate pass.
|
||||
|
||||
Doing the conversion in a separate pass over the instructions
|
||||
is another alternative. This can be done right after method_to_ir ()
|
||||
or after the SSA pass (since the IR after the SSA pass should look
|
||||
almost like the IR we get back from method_to_ir ()).
|
||||
|
||||
This has the following advantages:
|
||||
*) monoburg will handle only the simple opcodes (makes porting easier)
|
||||
*) the instruction selection will be run on all the additional trees
|
||||
*) it's easier to support coarse opcodes that produce multiple expression
|
||||
trees (and apply the monoburg selector on all of them)
|
||||
*) the SSA optimizer will see the original opcodes and will be able to use
|
||||
the semantic info associated with them
|
||||
|
||||
The disadvantage is that this is a separate pass on the code and
|
||||
it takes time (how much has not been measured yet, though).
|
||||
|
||||
With this approach, we may also be able to have C implementations
|
||||
of some of the opcodes: this pass would insert a function call to
|
||||
the C implementation (for example in the cases when first porting
|
||||
to a new arch and implemenating some stuff may be too hard in asm).
|
||||
|
||||
* Extended basic blocks
|
||||
|
||||
IL code needs a lot of checks, bounds checks, overflow checks,
|
||||
type checks and so on. This potentially increases by a lot
|
||||
the number of basic blocks in a control flow graph. However,
|
||||
all such blocks end up with a throw opcode that gives control to the
|
||||
exception handling mechanism.
|
||||
After method_to_ir () a MonoBasicBlock can be considered a sort
|
||||
of extended basic block where the additional exits don't point
|
||||
to basic blocks in the same procedure (at least when the method
|
||||
doesn't have exception tables).
|
||||
We need to make sure the passes following method_to_ir () can cope
|
||||
with such kinds of extended basic blocks (especially the passes
|
||||
that we need to apply to all the methods: as a start, we could
|
||||
skip SSA optimizations for methods with exception clauses...)
|
||||
|
||||
@@ -292,7 +292,6 @@ mono_free_method
|
||||
mono_free_verify_list
|
||||
mono_gc_collect
|
||||
mono_gc_collection_count
|
||||
mono_gc_enable_events
|
||||
mono_gc_get_generation
|
||||
mono_gc_get_heap_size
|
||||
mono_gc_get_used_size
|
||||
@@ -301,9 +300,7 @@ mono_gchandle_get_target
|
||||
mono_gchandle_new
|
||||
mono_gchandle_new_weakref
|
||||
mono_gc_invoke_finalizers
|
||||
mono_gc_is_finalizer_thread
|
||||
mono_gc_max_generation
|
||||
mono_gc_out_of_memory
|
||||
mono_gc_wbarrier_arrayref_copy
|
||||
mono_gc_wbarrier_generic_nostore
|
||||
mono_gc_wbarrier_generic_store
|
||||
@@ -604,7 +601,6 @@ mono_object_get_class
|
||||
mono_object_get_domain
|
||||
mono_object_get_size
|
||||
mono_object_get_virtual_method
|
||||
mono_object_is_alive
|
||||
mono_object_isinst
|
||||
mono_object_isinst_mbyref
|
||||
mono_object_new
|
||||
|
||||
@@ -1,16 +0,0 @@
|
||||
<h1>Mono 1.0 Release Notes</h1>
|
||||
|
||||
<h2>What does Mono Include</h2>
|
||||
|
||||
<h2>Missing functionality</h2>
|
||||
|
||||
<p>COM support.
|
||||
|
||||
<p>EnterpriseServices are non-existant.
|
||||
|
||||
<p>Windows.Forms is only available as a preview, it is not
|
||||
completed nor stable.
|
||||
|
||||
<h3>Assembly: System.Drawing</h3>
|
||||
|
||||
<p>System.Drawing.Printing is not supported.
|
||||
@@ -93,12 +93,9 @@
|
||||
|
||||
<h4><a name="api:mono_gc_disable">mono_gc_disable</a></h4>
|
||||
<h4><a name="api:mono_gc_enable">mono_gc_enable</a></h4>
|
||||
<h4><a name="api:mono_gc_is_finalizer_thread">mono_gc_is_finalizer_thread</a></h4>
|
||||
<h4><a name="api:mono_gc_out_of_memory">mono_gc_out_of_memory</a></h4>
|
||||
<h4><a name="api:mono_gc_start_world">mono_gc_start_world</a></h4>
|
||||
<h4><a name="api:mono_gc_stop_world">mono_gc_stop_world</a></h4>
|
||||
<h4><a name="api:mono_gc_alloc_fixed">mono_gc_alloc_fixed</a></h4>
|
||||
<h4><a name="api:mono_gc_enable_events">mono_gc_enable_events</a></h4>
|
||||
<h4><a name="api:mono_gc_free_fixed">mono_gc_free_fixed</a></h4>
|
||||
<h4><a name="api:mono_gc_make_descr_from_bitmap">mono_gc_make_descr_from_bitmap</a></h4>
|
||||
|
||||
|
||||
@@ -93,7 +93,6 @@ result = mono_object_new (mono_domain_get (), version_class);
|
||||
<h4><a name="api:mono_object_isinst">mono_object_isinst</a></h4>
|
||||
<h4><a name="api:mono_object_unbox">mono_object_unbox</a></h4>
|
||||
<h4><a name="api:mono_object_castclass_mbyref">mono_object_castclass_mbyref</a></h4>
|
||||
<h4><a name="api:mono_object_is_alive">mono_object_is_alive</a></h4>
|
||||
<h4><a name="api:mono_object_get_size">mono_object_get_size</a></h4>
|
||||
|
||||
<a name="valuetypes"></a>
|
||||
|
||||
@@ -1,33 +0,0 @@
|
||||
Size and alignment requirements of stack values
|
||||
===============================================
|
||||
|
||||
P ... System.IntPtr
|
||||
I1 ... System.Int8
|
||||
I2 ... System.Int16
|
||||
I4 ... System.Int32
|
||||
I8 ... System.Int64
|
||||
F ... System.Single
|
||||
D ... System.Double
|
||||
LD ... native long double
|
||||
|
||||
-----------------------------------------------------------
|
||||
ARCH | P | I1 | I2 | I4 | I8 | F | D | LD |
|
||||
-----------------------------------------------------------
|
||||
X86 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
|
||||
-----------------------------------------------------------
|
||||
X86/W32 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
|
||||
-----------------------------------------------------------
|
||||
ARM | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 | 8/4 |
|
||||
-----------------------------------------------------------
|
||||
M68K | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
|
||||
-----------------------------------------------------------
|
||||
ALPHA | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 |
|
||||
-----------------------------------------------------------
|
||||
SPARC | 4/4 | 4/4 | 4/4 | 4/4 | 8/8 | 4/4 | 8/8 |16/8 |
|
||||
-----------------------------------------------------------
|
||||
SPARC64 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 |16/16|
|
||||
-----------------------------------------------------------
|
||||
MIPS | 4/4 | 4/4 | 4/4 | 4/4 | ?/? | 4/4 | 8/8 | 8/8 |
|
||||
-----------------------------------------------------------
|
||||
| | | | | | | | |
|
||||
-----------------------------------------------------------
|
||||
@@ -1,261 +0,0 @@
|
||||
|
||||
Purpose
|
||||
|
||||
Especially when inlining is active, it can happen that temporary
|
||||
variables add pressure to the register allocator, producing bad
|
||||
code.
|
||||
|
||||
The idea is that some of these temporaries can be totally eliminated
|
||||
my moving the MonoInst tree that defines them directly to the use
|
||||
point in the code (so the name "tree mover").
|
||||
|
||||
Please note that this is *not* an optimization: it is mostly a
|
||||
workaround to issues we have in the regalloc.
|
||||
Actually, with the new linear IR this will not be possible at all
|
||||
(there will be no more trees in the code!).
|
||||
Anyway, this workaround turns out to be useful in the current state
|
||||
of things...
|
||||
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
Base logic
|
||||
|
||||
If a local is defined by a value which is a proper expression (a tree
|
||||
of MonoInst, not just another local or a constant), and this definition
|
||||
is used only once, the tree can be moved directly to the use location,
|
||||
and the definition eliminated.
|
||||
Of course, none of the variables used in the tree must be defined in
|
||||
the code path between the definition and the use, and the tree must be
|
||||
free of side effects.
|
||||
We do not handle the cases when the tree is just a local or a constant
|
||||
because they are handled by copyprop and consprop, respectively.
|
||||
|
||||
To make things simpler, we restrict the tree move to the case when:
|
||||
- the definition and the use are in the same BB, and
|
||||
- the use is followed by another definition in the same BB (it is not
|
||||
possible that the 1st value is used again), or alternatively there
|
||||
is no BB in the whole CFG that contains a use of this local before a
|
||||
definition (so, again, there is no code path that can lead to a
|
||||
subsequent use).
|
||||
|
||||
To handle this, we maintain an ACT array (Available Copy Tree, similar
|
||||
to the ACP), where we store the "state" of every local.
|
||||
Ideally, every local can be in the following state:
|
||||
[E] Undefined (by a tree, it could be in the ACP but we don't care).
|
||||
[D] Defined (by a tree), and waiting for a use.
|
||||
[U] Used, with a tree definition available in the same BB, but still
|
||||
without a definition following the use (always in the same BB).
|
||||
Of course state [E] (empty) is the initial one.
|
||||
|
||||
Besides, there are two sort of "meta states", or flags:
|
||||
[W] Still waiting for a use or definition in this BB (we have seen no
|
||||
occurrence of the local yet).
|
||||
[X] Used without being previously defined in the same BB (note that if
|
||||
there is a definition that precedes the use in the same BB, even if
|
||||
the definition is not a tree or is not available because of side
|
||||
effects or because the tree value has changed the local is not in
|
||||
state [X]).
|
||||
Also note that state [X] is a sort of "global" condition, which if set
|
||||
in one BB will stay valid for the whole CFG, even if the local will
|
||||
otherwise change state. The idea of flagging a local as [X] is that if
|
||||
there is a definition/use pair that reaches the end of a BB, it could
|
||||
be that there is a CFG path that then leads to the BB flagging it as
|
||||
[X] (which contains a use), so the tree cannot be moved.
|
||||
So state [X] will always be set, and never examined in all the state
|
||||
transitions we will describe.
|
||||
In practice, we use flag [W] to set state [X]: if, when traversing a
|
||||
BB, we find a use for a local in state [W], then that local is flagged
|
||||
[X].
|
||||
|
||||
|
||||
For each BB, we initialize all states to [E] and [W], and then we
|
||||
traverse the code one inst at a time, and update the variable states
|
||||
in the ACT in the following ways:
|
||||
|
||||
[Definition]
|
||||
- Flag [W] is cleared.
|
||||
- All "affected trees" are killed (go from state [D] to [E]).
|
||||
The "affected trees" are the trees which contain (use) the defined
|
||||
local, and the rationale is that the tree value changed, so the
|
||||
tree is no longer available.
|
||||
- If the local was in state [U], *that* tree move is marked "safe"
|
||||
(because *this* definition makes us sure that the previous tree
|
||||
cannot be used again in any way).
|
||||
The idea is that "safe" moves can happen even if the local is
|
||||
flagged [X], because the second definition "covers" the use.
|
||||
The tree move is then saved in the "todo" list (and the affecting
|
||||
nodes are cleared).
|
||||
- If the local was defined by a tree, it goes to state [D], the tree
|
||||
is recorded, and all the locals used in it are marked as "affecting
|
||||
this tree" (of course these markers are lists, because each local
|
||||
could affect more than one tree).
|
||||
|
||||
[IndirectDefinition]
|
||||
- All potentially affected trees (in state [D]) are killed.
|
||||
|
||||
[Use]
|
||||
- If the local is still [W], it is flagged [X] (the [W] goes away).
|
||||
- If the local is in state [D], it goes to state [U].
|
||||
The tree move must not yet be recorded in the "todo" list, it still
|
||||
stays in the ACT slot belonging to this local.
|
||||
Anyway, the "affecting" nodes are updated, because now a definition
|
||||
of a local used in this tree will affect only "indirect" (or also
|
||||
"propagated") moves, but not *this* move (see below).
|
||||
- If the local is in state [U], then the tree cannot be moved (it is
|
||||
used two times): the move is canceled, and the state goes [E].
|
||||
- If the local is in state [E], the use is ignored.
|
||||
|
||||
[IndirectUse]
|
||||
- All potentially affected trees (in state [D] or [U]) are killed.
|
||||
|
||||
[SideEffect]
|
||||
- Tree is marked as "unmovable".
|
||||
|
||||
Then, at the end of the BB, for each ACT slot:
|
||||
- If state is [U], the tree move is recorded in the "todo" list, but
|
||||
flagged "unsafe".
|
||||
- Anyway, state goes to [E], the [W] flag is set, and all "affecting"
|
||||
lists are cleared (we get ready to traverse the next BB).
|
||||
Finally, when all BBs has been scanned, we traverse the "todo" list,
|
||||
moving all "safe" entries, and moving "unsafe" ones only if their ACT
|
||||
slot is not flagged [X].
|
||||
|
||||
So far, so good.
|
||||
But there are two issues that make things harder :-(
|
||||
|
||||
The first is the concept of "indirect tree move".
|
||||
It can happen that a tree is scheduled for moving, and its destination
|
||||
is a use that is located in a second tree, which could also be moved.
|
||||
The main issue is that a definition of a variable of the 1st tree on
|
||||
the path between the definition and the use of the 2nd one must prevent
|
||||
the move.
|
||||
But which move? The 1st or the 2nd?
|
||||
Well, any of the two!
|
||||
The point is, the 2nd move must be prevented *only* if the 1st one
|
||||
happens: if it is aborted (for an [X] flag or any other reason), the
|
||||
2nd move is OK, and vice versa...
|
||||
We must handle this in the following way:
|
||||
- The ACT must still remember if a slot is scheduled for moving in
|
||||
this BB, and if it is, all the locals used in the tree.
|
||||
We say that the slot is in state [M].
|
||||
Note that [M] is (like [X] and [W]) a sort of "meta state": a local
|
||||
is flagged [M] when it goes to state [U], and the flag is cleared
|
||||
when the tree move is cancelled
|
||||
- A tree that uses a local whose slot is in state [M] is also using all
|
||||
the locals used by the tree in state [M], but the use is "indirect".
|
||||
These use nodes are also included in the "affecting" lists.
|
||||
- The definition of a variable used in an "indirect" way has the
|
||||
effect of "linking" the two involved tree moves, saying that only one
|
||||
of the two can happen in practice, but not both.
|
||||
- When the 2nd tree is scheduled for moving, the 1st one is *still* in
|
||||
state [M], because a third move could "carry it forward", and all the
|
||||
*three* moves should be mutually exclusive (to be safe!).
|
||||
|
||||
The second tricky complication is the "tree forwarding" that can happen
|
||||
when copyprop is involved.
|
||||
It is conceptually similar to the "indirect tree move".
|
||||
Only, the 2nd tree is not really a tree, it is just the local defined
|
||||
in the 1st tree move.
|
||||
It can happen that copyprop will propagate the definition.
|
||||
We cannot make treeprop do the same job of copyprop, because copyprop
|
||||
has less constraints, and is therefore more powerful in its scope.
|
||||
The main issue is that treeprop cannot propagate a tree to *two* uses,
|
||||
while copyprop is perfectly capable of propagating one definition to
|
||||
two (or more) different places.
|
||||
So we must let copyprop do its job otherwise we'll miss optimizations,
|
||||
but we must also make it play safe with treeprop.
|
||||
Let's clarify with an example:
|
||||
a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
|
||||
b = a; //a is used, state [U] with move scheduled, and
|
||||
//b is defined by a, ACP[b] is a, and b is in state [DC]
|
||||
c = b + v3; // b is used, goes to state [U]
|
||||
The real trouble is that copyprop happens *immediately*, while treeprop
|
||||
is deferred to the end of the CFG traversal.
|
||||
So, in the 3rd statement, the "b" is immediately turned into an "a" by
|
||||
copyprop, regardless of what treeprop will do.
|
||||
Anyway, if we are careful, this is not so bad.
|
||||
First of all, we must "accept" the fact that in the 3rd statement the
|
||||
"b" is in fact an "a", as treeprop must happen *after* copyprop.
|
||||
The real problem is that "a" is used twice: in the 2nd and 3rd lines.
|
||||
In our usual setup, the 2nd line would set it to [U], and the 3rd line
|
||||
would kill the move (and set "a" to [E]).
|
||||
I have tried to play tricks, and reason as of copyprop didn't happen,
|
||||
but everything becomes really messy.
|
||||
Instead, we should note that the 2nd line is very likely to be dead.
|
||||
At least in this BB, copyprop will turn all "b"s into "a"s as long as
|
||||
it can, and when it cannot, it will be because either "a" or "b" have
|
||||
been redefined, which would be after the tree move anyway.
|
||||
So, the reasoning gets different: let's pretend that "b" will be dead.
|
||||
This will make the "a" use in the 2nd statement useless, so there we
|
||||
can "reset" "a" to [D], but also take note that if "b" will end up
|
||||
not being dead, the tree move associated to this [D] must be aborted.
|
||||
We can detect this in the following way:
|
||||
- Either "b" is used before being defined in this BB, or
|
||||
- It will be flagged "unsafe".
|
||||
Both things are very easy to check.
|
||||
The only quirk is that the "affecting" lists must not be cleared when
|
||||
a slot goes to state [U], because a "propagation" could put it back
|
||||
to state [D] (where those lists are needed, because it can be killed
|
||||
by a definition to a used slot).
|
||||
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
Implementation notes
|
||||
|
||||
All the implementation runs inside the existing mono_local_cprop
|
||||
function, and a separate memory pool is used to hold the temporary
|
||||
data.
|
||||
|
||||
A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
|
||||
the list of scheduled moves and auxiliary things.
|
||||
This struct is allocated if the tree move pass is requested, and is
|
||||
then passed along to all the involved functions, which are therefore
|
||||
aware of the tree mover state.
|
||||
|
||||
The ACT is an array of slots, obviously one per local.
|
||||
Each slot is of type MonoTreeMoverActSlot, and contains the used and
|
||||
affected locals, a pointer to the pending tree move and the "waiting"
|
||||
and "unsafe" flags.
|
||||
|
||||
The "affecting" lists a built from "dependency nodes", of type
|
||||
MonoTreeMoverDependencyNode.
|
||||
Each of the nodes contains the used and affected local, and is in
|
||||
two lists: the locals used by a slot, and the locals affected by a
|
||||
slot (obviously a different one).
|
||||
So, each node means: "variable x is used in tree t, so a definition
|
||||
of x affects tree t".
|
||||
The "affecting" lists are doubly linked, to allow for O(1) deletion.
|
||||
The "used" lists are simply linked, but when they are mantained there
|
||||
is always a pointer to the last element to allow for O(1) list moving.
|
||||
When a used list is dismissed (which happens often, any time a node is
|
||||
killed), its nodes are unlinked from their respective affecting lists
|
||||
and are then put in a "free" list in the MonoTreeMover to be reused.
|
||||
|
||||
Each tree move is represented by a struct (MonoTreeMoverTreeMove),
|
||||
which contains:
|
||||
- the definition and use points,
|
||||
- the "affected" moves (recall the concept of "indirect tree move"),
|
||||
- the "must be dead" slots (recall "tree forwarding"). and
|
||||
- a few utility flags.
|
||||
The tree moves stays in the relevant ACT slot until it is ready to be
|
||||
scheduled for moving, at which point it is put in a list in the
|
||||
MonoTreeMover.
|
||||
The tree moves structs are reused when they are killed, so there is
|
||||
also a "free" list for them in the MonoTreeMover.
|
||||
|
||||
The tree mover code has been added to all the relevant functions that
|
||||
participate in consprop and copyprop, particularly:
|
||||
- mono_cprop_copy_values takes care of variable uses (transitions from
|
||||
states [D] to [U] and [U] to [E] because of killing),
|
||||
- mono_cprop_invalidate_values takes care of side effects (indirect
|
||||
accesses, calls...),
|
||||
- mono_local_cprop_bb sets up and cleans the traversals for each BB,
|
||||
and for each MonoInst it takes care of variable definitions.
|
||||
To each of them has been added a MonoTreeMover parameter, which is not
|
||||
NULL if the tree mover is running.
|
||||
After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
|
||||
the list of all the pending moves, which must be walked to actually
|
||||
perform the moves (when possible, because "unsafe" flags, "affected"
|
||||
moves and "must be dead" slots can still have their effects, which
|
||||
must be handled now because they are fully known only at the end of
|
||||
the CFG traversal).
|
||||
Reference in New Issue
Block a user