Imported Upstream version 4.2.0.179

Former-commit-id: 0a113cb3a6feb7873f632839b1307cc6033cd595
This commit is contained in:
Xamarin Public Jenkins
2015-08-26 07:17:56 -04:00
committed by Jo Shields
parent 183bba2c9a
commit 6992685b86
7507 changed files with 90259 additions and 657307 deletions

View File

@@ -14,7 +14,6 @@ ASSEMBLED_DOCS = \
EXTRA_DIST = \
abc-removal.txt \
api-style.css \
assembly-bundle \
check-exports \
check-coverage \
convert.cs \
@@ -23,7 +22,6 @@ EXTRA_DIST = \
docs.make \
documented \
embedded-api \
exceptions \
exdoc \
file-share-modes \
gc-issues \
@@ -35,33 +33,25 @@ EXTRA_DIST = \
jit-imt \
jit-thoughts \
jit-trampolines \
local-regalloc.txt \
magic.diff \
mini-doc.txt \
mono-api-metadata.html \
mono-file-formats.config\
mono-file-formats.source\
mono_handle_d \
mono-tools.config \
mono-tools.source \
monoapi.source \
new-regalloc \
object-layout \
opcode-decomp.txt \
precise-gc \
produce-lists \
public \
public-api \
README \
release-notes-1.0.html \
remoting \
ssapre.txt \
stack-alignment \
stack-overflow.txt \
threading \
toc.xml \
TODO \
tree-mover.txt \
unmanaged-calls
dist-hook:

View File

@@ -370,7 +370,6 @@ ASSEMBLED_DOCS = \
EXTRA_DIST = \
abc-removal.txt \
api-style.css \
assembly-bundle \
check-exports \
check-coverage \
convert.cs \
@@ -379,7 +378,6 @@ EXTRA_DIST = \
docs.make \
documented \
embedded-api \
exceptions \
exdoc \
file-share-modes \
gc-issues \
@@ -391,33 +389,25 @@ EXTRA_DIST = \
jit-imt \
jit-thoughts \
jit-trampolines \
local-regalloc.txt \
magic.diff \
mini-doc.txt \
mono-api-metadata.html \
mono-file-formats.config\
mono-file-formats.source\
mono_handle_d \
mono-tools.config \
mono-tools.source \
monoapi.source \
new-regalloc \
object-layout \
opcode-decomp.txt \
precise-gc \
produce-lists \
public \
public-api \
README \
release-notes-1.0.html \
remoting \
ssapre.txt \
stack-alignment \
stack-overflow.txt \
threading \
toc.xml \
TODO \
tree-mover.txt \
unmanaged-calls
TOOL_MAKE = $(MAKE) -f $(srcdir)/docs.make topdir=$(srcdir)/../mcs srcdir=$(srcdir)

View File

@@ -1,57 +0,0 @@
HOWTO bundle assemblies inside the mono runtime.
Paolo Molaro (lupus@ximian.com)
* Intent
Bundling assemblies inside the mono runtime may be useful for a number
of reasons:
* creating a standalone complete runtime that can be more easily
distributed
* having an application run against a known set of assemblies
that has been tested
Of course, there are drawbacks, too: if there has been fixes
to the assemblies, replacing them means recompiling the
runtime as well and if there are other mono apps, unless they
use the same mono binary, there will be less opportunities for
the operating system to optimize memory usage. So use this
feature only when really needed.
* Creating the Bundle
To bundle a set of assemblies, you need to create a file that
lists the assembly names and the relative files. Empty lines
and lines starting with # are ignored:
== cut cut ==
# Sample bundle template
mscorlib: /path/to/mscorlib/assembly.dll
myapp: /path/to/myapp.exe
== cut cut ==
Next you need to build the mono runtime using a special configure option:
./configure --with-bundle=/path/to/bundle/template
The path to the template should be an absolute path.
The script metadata/make-bundle.pl will take the specifie
assemblies and embed them inside the runtime where the loading
routines can find them before searching for them on disk.
* Open Issues
There are still two issues to solve:
* config files: sometimes they are needed but they are
not yet bundled inside the library ()
* building with the included libgc makes it not
possible to build a mono binary statically linked to
libmono: this needs to be fixed to make bundles
really useful.

View File

@@ -292,7 +292,6 @@ mono_free_method
mono_free_verify_list
mono_gc_collect
mono_gc_collection_count
mono_gc_enable_events
mono_gc_get_generation
mono_gc_get_heap_size
mono_gc_get_used_size
@@ -301,9 +300,7 @@ mono_gchandle_get_target
mono_gchandle_new
mono_gchandle_new_weakref
mono_gc_invoke_finalizers
mono_gc_is_finalizer_thread
mono_gc_max_generation
mono_gc_out_of_memory
mono_gc_wbarrier_arrayref_copy
mono_gc_wbarrier_generic_nostore
mono_gc_wbarrier_generic_store
@@ -604,7 +601,6 @@ mono_object_get_class
mono_object_get_domain
mono_object_get_size
mono_object_get_virtual_method
mono_object_is_alive
mono_object_isinst
mono_object_isinst_mbyref
mono_object_new

View File

@@ -476,33 +476,6 @@ mono_loader_lock (void)
<div class="prototype">Prototype: mono_gc_enable</div>
<p />
</div> <a name="api:mono_gc_is_finalizer_thread"></a>
<div class="api">
<div class="api-entry">mono_gc_is_finalizer_thread</div>
<div class="prototype">gboolean
mono_gc_is_finalizer_thread (MonoThread *thread)
</div>
<p />
<b>Parameters</b>
<blockquote><dt><i>thread:</i></dt><dd> the thread to test.</dd></blockquote>
<b>Remarks</b>
<p />
In Mono objects are finalized asynchronously on a separate thread.
This routine tests whether the <i>thread</i> argument represents the
finalization thread.
<p />
Returns true if <i>thread</i> is the finalization thread.
</div> <a name="api:mono_gc_out_of_memory"></a>
<div class="api">
<div class="api-entry">mono_gc_out_of_memory</div>
<div class="prototype">Prototype: mono_gc_out_of_memory</div>
<p />
</div> <a name="api:mono_gc_start_world"></a>
<div class="api">
<div class="api-entry">mono_gc_start_world</div>
@@ -524,13 +497,6 @@ mono_gc_is_finalizer_thread (MonoThread *thread)
<div class="prototype">Prototype: mono_gc_alloc_fixed</div>
<p />
</div> <a name="api:mono_gc_enable_events"></a>
<div class="api">
<div class="api-entry">mono_gc_enable_events</div>
<div class="prototype">Prototype: mono_gc_enable_events</div>
<p />
</div> <a name="api:mono_gc_free_fixed"></a>
<div class="api">
<div class="api-entry">mono_gc_free_fixed</div>

View File

@@ -111,7 +111,7 @@ mono_print_method_from_ip (void *ip)
This prints the name of the method at address <i>ip</i> in the standard
output. Unlike mono_pmip which returns a string, this routine
prints the value on the standard output.
prints the value on the standard output.
</div> <a name="api:mono_print_thread_dump"></a>
<div class="api">

View File

@@ -108,7 +108,6 @@ MonoObject* <a href="#api:mono_object_isinst">mono_object_isinst</a>
gpointer <a href="#api:mono_object_unbox">mono_object_unbox</a> (MonoObject *obj);
MonoObject* <a href="#api:mono_object_castclass_mbyref">mono_object_castclass_mbyref</a> (MonoObject *obj,
MonoClass *klass);
<a href="#api:mono_object_is_alive"></a>
guint <a href="#api:mono_object_get_size">mono_object_get_size</a> (MonoObject* o);
MonoObject* <a href="#api:mono_value_box">mono_value_box</a> (MonoDomain *domain,
MonoClass *class,
@@ -423,13 +422,6 @@ mono_object_castclass_mbyref (MonoObject *obj, MonoClass *klass)
<blockquote> <i>obj</i> if <i>obj</i> is derived from <i>klass</i>, throws an exception otherwise
</blockquote>
</div> <a name="api:mono_object_is_alive"></a>
<div class="api">
<div class="api-entry">mono_object_is_alive</div>
<div class="prototype">Prototype: mono_object_is_alive</div>
<p />
</div> <a name="api:mono_object_get_size"></a>
<div class="api">
<div class="api-entry">mono_object_get_size</div>

View File

@@ -269,12 +269,9 @@ mono_gc_weak_link_get
mono_gc_weak_link_remove
mono_gc_disable
mono_gc_enable
mono_gc_is_finalizer_thread
mono_gc_out_of_memory
mono_gc_start_world
mono_gc_stop_world
mono_gc_alloc_fixed
mono_gc_enable_events
mono_gc_free_fixed
mono_gc_make_descr_from_bitmap
mono_gc_base_init
@@ -526,7 +523,6 @@ mono_object_isinst
mono_object_register_finalizer
mono_object_unbox
mono_object_castclass_mbyref
mono_object_is_alive
mono_object_get_size
mono_value_box
mono_value_copy

View File

@@ -1,110 +0,0 @@
Exception Implementation in the Mono Runtime
Dietmar Maurer (dietmar@ximian.com)
(C) 2001 Ximian, Inc.
Exception implementation (jit):
===============================
Stack unwinding:
================
We record the code address (start_address, size) of all methods. That way it is
possible to map an instruction pointer (IP) to the method information needed
for unwinding the stack:
We also save a Last Managed Frame (LMF) structure at each call from managed to
unmanaged code. That way we can recover from exceptions inside unmanaged code.
void handle_exception ((struct sigcontext *ctx, gpointer obj)
{
if (ctx->bp < mono_end_of_stack) {
/* unhandled exception */
abort ();
}
info = mono_jit_info_table_find (mono_jit_info_table, ctx->ip);
if (info) { // we are inside managed code
if (ch = find_catch_handler ())
execute_catch_handler (ch, ctx, obj);
execute_all_finally_handler ();
// restore register, including IP and Frame pointer
ctx = restore_caller_saved_registers_from_ctx (ji, ctx);
// continue unwinding
handle_exception (ctx, obj);
} else {
lmf = get_last_managed_frame ();
// restore register, including IP and Frame pointer
ctx = restore_caller_saved_registers_from_lmf (ji, lmf);
// continue unwinding
handle_exception (ctx, obj);
}
}
Code generation:
================
leave: is simply translated into a branch to the target. If the leave
instruction is inside a finally block (but not inside another handler)
we call the finally handler before we branch to the target.
finally/endfinally, filter/endfilter: is translated into subroutine ending with
a "return" statement. The subroutine does not save EBP, because we need access
to the local variables of the enclosing method. Its is possible that
instructions inside those handlers modify the stack pointer, thus we save the
stack pointer at the start of the handler, and restore it at the end. We have
to use a "call" instruction to execute such finally handlers. This makes it
also possible to execute them inside the stack unwinding code. The exception
object for filters is passed in a local variable (cfg->exvar).
throw: we first save all regs into a sigcontext struct and then call the stack
unwinding code.
catch handler: catch hanlders are always called from the stack unwinding
code. The exception object is passed in a local variable (cfg->exvar).
gcc support for Exceptions
==========================
gcc supports exceptions in files compiled with the -fexception option. gcc
generates DWARF exceptions tables in that case, so it is possible to unwind the
stack. The method to read those exception tables is contained in libgcc.a, and
in newer versions of glibc (glibc 2.2.5 for example), and it is called
__frame_state_for(). Another usable glibc function is backtrace_symbols() which
returns the function name corresponding to a code address.
We dynamically check if those features are available using g_module_symbol(),
and we use them only when available. If not available we use the LMF as
fallback.
Using gcc exception information prevents us from saving the LMF at each native
call, so this is a way to speed up native calls. This is especially valuable
for internal calls, because we can make sure that all internal calls are
compiled with -fexceptions (we compile the whole mono runtime with that
option).
All native function are able to call function without exception tables, and so
we are unable to restore all caller saved registers if an exception is raised
in such function. Well, its possible if the previous function already saves all
registers. So we only omit the the LMF if a function has an exception table
able to restore all caller saved registers.
One problem is that gcc almost never saves all caller saved registers, because
it is just unnecessary in normal situations. But there is a trick forcing gcc
to save all register, we just need to call __builtin_unwind_init() at the
beginning of a function. That way gcc generates code to save all caller saved
register on the stack.

View File

@@ -1,208 +0,0 @@
* Proposal for the local register allocator
The local register allocator deals with allocating registers
for temporaries inside a single basic block, while the global
register allocator is concerned with method-wide allocation of
variables.
The global register allocator uses callee-saved register for it's
purpouse so that there is no need to save and restore these registers
at call sites.
There are a number of issues the local allocator needs to deal with:
*) some instructions expect operands in specific registers (for example
the shl instruction on x86, or the call instruction with thiscall
convention, or the equivalent call instructions on other architectures,
such as the need to put output registers in %oX on sparc)
*) some instructions deliver results only in specific registers (for example
the div instruction on x86, or the call instructionson on almost all
the architectures).
*) it needs to know what registers may be clobbered by an instruction
(such as in a method call)
*) it should avoid excessive reloads or stores to improve performance
While which specific instructions have limitations is architecture-dependent,
the problem shold be solved in an arch-independent way to reduce code duplication.
The register allocator will be 'driven' by the arch-dependent code, but it's
implementation should be arch-independent.
To improve the current local register allocator, we need to
keep more state in it than the current setup that only keeps busy/free info.
Possible state information is:
free: the resgister is free to use and it doesn't contain useful info
freeable: the register contains data loaded from a local (there is
also info about _which_ local it contains) as a result from previous
instructions (like, there was a store from the register to the local)
moveable: it contains live data that is needed in a following instruction, but
the contents may be moved to a different register
busy: the register contains live data and it is placed there because
the following instructions need it exactly in that register
allocated: the register is used by the global allocator
The local register allocator will have the following interfaces:
int get_register ();
Searches for a register in the free state. If it doesn't find it,
searches for a freeable register. Sets the status to moveable.
Looking for a 'free' register before a freeable one should allow for
removing a few redundant loads (though I'm still unsure if such
things should be delegated entirely to the peephole pass).
int get_register_force (int reg);
Returns 'reg' if it is free or freeable. If it is moveable, it moves it
to another free or freeable register.
Sets the status of 'reg' to busy.
void set_register_freeable (int reg);
Sets the status of 'reg' to freeable.
void set_register_free (int reg);
Sets the status of 'reg' to free.
void will_clobber (int reg);
Spills the register to the stack. Sets the status to freeable.
After the clobbering has occurred, set the status to free.
void register_unspill (int reg);
Un-spills register reg and sets the status to moveable.
FIXME: how is the 'local' information represented? Maybe a MonoInst* pointer.
Note: the register allocator will insert instructions in the basic block
during it's operation.
* Examples
Given the tree (on x86 the right argument to shl needs to be in ecx):
store (local1, shl (local1, call (some_arg)))
At the start of the basic block, the registers are set to the free state.
The sequence of instructions may be:
instruction register status -> [%eax %ecx %edx]
start free free free
eax = load local1 mov free free
/* call clobbers eax, ecx, edx */
spill eax free free free
call mov free free
/* now eax contains the right operand of the shl */
mov %eax -> %ecx free busy free
un-spill mov busy free
shl %cl, %eax mov free free
The resulting x86 code is:
mov $fffc(%ebp), %eax
mov %eax, $fff0(%ebp)
push some_arg
call func
mov %eax, %ecx
mov $fff0(%ebp), %eax
shl %cl, %eax
Note that since shl could operate directly on memory, we could have:
push some_arg
call func
mov %eax, %ecx
shl %cl, $fffc(%ebp)
The above example with loading the operand in a register is just to complicate
the example and show that the algorithm should be able to handle it.
Let's take another example with the this-call call convention (the first argument
is passed in %ecx).
In this case, will_clobber() will be called only on %eax and %edx, while %ecx
will be allocated with get_register_force ().
Note: when a register is allocated with get_register_force(), it should be set
to a different state as soon as possible.
store (local1, shl (local1, this-call (local1)))
instruction register status -> [%eax %ecx %edx]
start free free free
eax = load local1 mov free free
/* force load in %ecx */
ecx = load local1 mov busy free
spill eax free busy free
call mov free free
/* now eax contains the right operand of the shl */
mov %eax -> %ecx free busy free
un-spill mov busy free
shl %cl, %eax mov free free
What happens when a register that we need to allocate with get_register_force ()
contains an operand for the next instruction?
instruction register status -> [%eax %ecx %edx]
eax = load local0 mov free free
ecx = load local1 mov mov free
get_register_force (ecx) here.
We have two options:
mov %ecx, %edx
or:
spill %ecx
The first option is way better (and allows the peephole pass to
just load the value in %edx directly, instead of loading first to %ecx).
This doesn't work, though, if the instruction clobbers the %edx register
(like in a this-call). So, we first need to clobber the registers
(so the state of %ecx changes to freebale and there is no issue
with get_register_force ()).
What if an instruction both clobbers a register and requires it as
an operand? Lets' take the x86 idiv instruction as an example: it
requires the dividend in edx:eax and returns the result in eax,
with the modulus in edx.
store (local1, div (local1, local2))
instruction register status -> [%eax %ecx %edx]
eax = load local0 mov free free
will_clobber eax, edx free mov free
force mov %ecx, %eax busy free free
set %edx busy free busy
idiv mov free free
Note: edx is set to free after idiv, because the modulus is not needed
(if it was a rem, eax would have been freed).
If we load the divisor before will_clobber(), we'll have to spill
eax and reload it later. If we load it just after the idiv, there is no issue.
In any case, the algorithm should give the correct results and allow the operation.
Working recursively on the isntructions there shouldn't be huge issues
with this algorithm (though, of course, it's not optimal and it may
introduce excessive spills or register moves). The advantage over the current
local reg allocator is that:
1) the number of spills/moves would be smaller anyway
2) a separate peephole pass could be able to eliminate reg moves
3) we'll be able to remove the 'forced' spills we currently do with
the return value of method calls
* Issues
How to best integrate such a reg allocator with the burg stuff.
Think about a call os sparc with two arguments: they got into %o0 and %o1
and each of them sets the register as busy. But what if the values to put there
are themselves the result of a call? %o0 is no problem, but for all the
next argument n the above algorithm would spill all the 0...n-1 registers...
* Papers
More complex solutions to the local register allocator problem:
http://dimacs.rutgers.edu/TechnicalReports/abstracts/1997/97-33.html
Combining register allocation and instruction scheduling:
http://citeseer.nj.nec.com/motwani95combining.html
More on LRA euristics:
http://citeseer.nj.nec.com/liberatore97hardness.html
Linear-time optimal code scheduling for delayedload architectures
http://www.cs.wisc.edu/~fischer/cs701.f01/inst.sched.ps.gz
Precise Register Allocation for Irregular Architectures
http://citeseer.nj.nec.com/kong98precise.html
Allocate registers first to subtrees that need more of them.
http://www.upb.de/cs/ag-kastens/compii/folien/comment401-409.2.pdf

View File

@@ -1,15 +0,0 @@
This is a patch that can be applied to the magic file used by file(1) to
recognize mono assemblies.
Apply it to the magic file (usually in /usr/share/file/magic or
/usr/share/misc/magic) and recompile it with file -C.
--- magic.old 2006-03-24 21:12:25.000000000 +0100
+++ magic 2006-03-24 21:12:17.000000000 +0100
@@ -7205,6 +7205,7 @@
>>>>(0x3c.l+4) leshort 0x290 PA-RISC
>>>>(0x3c.l+22) leshort&0x0100 >0 32-bit
>>>>(0x3c.l+22) leshort&0x1000 >0 system file
+>>>>(0x3c.l+232) lelong >0 Mono/.Net assembly
>>>>(0x3c.l+0xf8) string UPX0 \b, UPX compressed
>>>>(0x3c.l+0xf8) search/0x140 PEC2 \b, PECompact2 compressed

View File

@@ -1,98 +0,0 @@
=pod
=head1 Internal design document for the mono_handle_d
This document is designed to hold the design of the mono_handle_d and
not as an api reference.
=head2 Primary goal and purpose
The mono_handle_d is a process which takes care of the (de)allocation
of scratch shared memory and handles (of files, threads, mutexes,
sockets etc. see L<WapiHandleType>) and refcounts of the
filehandles. It is designed to be run by a user and to be fast, thus
minimal error checking on input is done and will most likely crash if
given a faulty package. No effort has been, or should be, made to have
the daemon talking to machine of different endianness/size of int.
=head2 How to start the daemon
To start the daemon you either run the mono_handle_d executable or try
to attach to the shared memory segment via L<_wapi_shm_attach> which
will start a daemon if one does not exist.
=head1 Internal details
The daemon works by opening a socket and listening to clients. These
clients send packages over the socket complying to L<struct
WapiHandleRequest>.
=head2 Possible requests
=over
=item WapiHandleRequest_New
Find a handle in the shared memory segment that is free and allocate
it to the specified type. To destroy use
L</WapiHandleRequest_Close>. A L<WapiHandleResponse> with
.type=WapiHandleResponseType_New will be sent back with .u.new.handle
set to the handle that was allocated. .u.new.type is the type that was
requested.
=item WapiHandleRequestType_Open
Increase the ref count of an already created handle. A
L<WapiHandleResponse> with .type=WapiHandleResponseType_Open will be sent
back with .u.new.handle set to the handle, .u.new.type is set to the
type of handle this is.
=item WapiHandleRequestType_Close
Decrease the ref count of an already created handle. A
L<WapiHandleResponse> with .type=WapiHandleResponseType_Close will be
sent back with .u.close.destroy set to TRUE if ref count for this
client reached 0.
=item WapiHandleRequestType_Scratch
Allocate a shared memory area of size .u.scratch.length in bytes. A
L<WapiHandleResponse> with .type=WapiHandleResponseType_Scratch will be
sent back with .u.scratch.idx set to the index into the shared
memory's scratch area where to memory begins. (works just like
malloc(3))
=item WapiHandleRequestType_Scratch
Deallocate a shared memory area, this must have been allocated before
deallocating. A L<WapiHandleResponse> with
.type=WapiHandleResponseType_ScratchFree will be sent back (works just
like free(3))
=back
=head1 Why a daemon
From an email:
Dennis: I just have one question about the daemon... Why does it
exist? Isn't it better performancewise to just protect the shared area
with a mutex when allocation a new handle/shared mem segment or
changing refcnt? It will however be a less resilient to clients that
crash (the deamon cleans up ref'd handles if socket closes)
Dick: It's precisely because with a mutex the shared memory segment
can be left in a locked state. Also, it's not so easy to clean up
shared memory without it (you can't just mark it deleted when creating
it, because you can't attach any more readers to the same segment
after that). I did some minimal performance testing, and I don't
think the daemon is particularly slow.
=head1 Authors
Documentaion: Dennis Haney
Implementation: Dick Porter
=cut

View File

@@ -1,68 +0,0 @@
We need to switch to a new register allocator.
The current one is split in a global and a local register allocator.
The global one can assign only callee-saves registers and happens
on the tree-based internal representation: it assigns local variables
to hardware registers.
The local one happens on the linear representation on a per basic
block basis and assigns hard registers to virtual registers (which
hold temporary values during expression executions) and it deals also
with the platform-specific issues (fixed registers, call conventions).
Moving to a different register will help solve some of the performance
issues introduced by the above split, make the register more easily
portable and solve some of the issues generated by dealing with trees.
The general design ideas are below.
The new allocator should have a global view of all the method, so it can be
able to assign variables also to some of the volatile registers if possible,
even across basic blocks (this would improve performance).
The allocator would be driven by per-arch declarative data, so porting
should be easier: an architecture needs to specify register classes,
call convention and instructions requirements (similar to the gcc code).
The allocator should operate on the linear representation, this way it's
easier and faster to track usages more correctly. We need to assign virtual
registers on a per-method basis instead of per basic block. We can assign
virtual registers to variables, too. Note that since we fix the stack offset
of local vars only after this step (which happens after the burg rules are run),
some of the burg rules that try to optimize the code won't apply anymore:
the peephole code may need to be enhanced to do the optimizations instead.
We need to handle floating point registers in the global allocator, too.
The new allocator also needs to keep track precisely of which registers
contain references or managed pointers to allow us to move to a precise GC.
It may be worth to use a single increasing set of integers for the virtual
registers, with the class of the register stored separately (unless the
current local allocator which keeps interger and fp registers separate).
Since this is a large task, we need to do it in steps as much as possible.
The first is to run the register allocator _after_ the burg rules: this
requires a rewrite of the liveness code, too, to use linear indexes instead
of basic-block/tree number combinations. This can be done by:
*) allocating virtual regs to all the locals that can be register allocated
*) running the burg rules (some may require adjustments): the local virtual
registers are assigned starting from global-virt-regs+1, instead of the current
hardware-regs+1, so we can tell apart global and local virt regs.
*) running the liveness/whatever code is needed to allocate the global registers
*) allocate the rest of the local variables to stack slots
*) continue with the current local allocator
This work could take 2-3 weeks.
The next step is to define the kind of declarative data an architecture needs
and assigning virtual regs to all the registers and making the allocator
assign from the volatile registers, too.
Note that some of the code that is currently emitted in the arch-specific
code, will need to be emitted as instructions that the reg allocator
can inspect: think of a method that returns the first argument which is
received in a register: the current code copies it to either a local slot or
to a global reg in the prolog an copies it back to the return register
int he basic block, but since neither the regallocator nor the peephole code
knows about the prolog code, the first store cannot be optimized away.
The gcc code has some example of how to specify register classes in a
declarative way.

View File

@@ -1,113 +0,0 @@
* How to handle complex IL opcodes in an arch-independent way
Many IL opcodes are very simple: add, ldind etc.
Such opcodes can be implemented with a single cpu instruction
in most architectures (on some, a group of IL instructions
can be converted to a single cpu op).
There are many IL opcodes, though, that are more complex, but
can be expressed as a series of trees or a single tree of
simple operations. Such simple operations are architecture-independent.
It makes sense to decompose such complex IL instructions in their
simpler equivalent so that we gain in several ways:
*) porting effort is easier, because only the simple instructions
need to be implemented in arch-specific code
*) we could apply BURG rules to the trees and do pattern matching
on them to optimize the expressions according to the host cpu
The issue is: where do we do such conversion from coarse opcodes to
simple expressions?
* Doing the conversion in method_to_ir ()
Some of these conversions can certainly be done in method_to_ir (),
but it's not always easy to decide which are better done there and
which in a different pass.
For example, let's take ldlen: in the mono implementation, ldlen
can be simply implemented with a load from a fixed position in the
array object:
len = [reg + maxlen_offset]
However, ldlen carries also semantics information: the result is the
length of the array, and since in the CLR arrays are of fixed size,
this information can be useful to later do bounds check removal.
If we convert this opcode in method_to_ir () we lost some useful
information for further optimizations.
In some other ways, decomposing an opcode in method_to_ir() may
allow for better optimizations later on (need to come up with an
example here ...).
* Doing the conversion in inssel.brg
Some conversion may be done inside the burg rules: this has the
disadvantage that the instruction selector is not run again on
the resulting expression tree and we could miss some optimization
(this is what effectively happens with the coarse opcodes in the old
jit). This may also interfere with an efficient local register allocator.
It may be possible to add an extension in monoburg that allows a rule
such as:
recheck: LDLEN (reg) {
create an expression tree representing LDLEN
and return it
}
When the monoburg label process gets back a recheck, it will run
the labeling again on the resulting expression tree.
If this is possible at all (and in an efficient way) is a
question for dietmar:-)
It should be noted, though, that this may not always work, since
some complex IL opcodes may require a series of expression trees
and handling such cases in monoburg could become quite hairy.
For example, think of opcode that need to do multiple actions on the
same object: this basically means a DUP...
On the other end, if a complex opcode needs a DUP, monoburg doesn't
actually need to create trees if it emits the instructions in
the correct sequence and maintains the right values in the registers
(usually the values that need a DUP are not changed...). How
this integrates with the current register allocator is not clear, since
that assigns registers based on the rule, but the instructions emitted
by the rules may be different (this already happens with the current JIT
where a MULT is replaced with lea etc...).
* Doing it in a separate pass.
Doing the conversion in a separate pass over the instructions
is another alternative. This can be done right after method_to_ir ()
or after the SSA pass (since the IR after the SSA pass should look
almost like the IR we get back from method_to_ir ()).
This has the following advantages:
*) monoburg will handle only the simple opcodes (makes porting easier)
*) the instruction selection will be run on all the additional trees
*) it's easier to support coarse opcodes that produce multiple expression
trees (and apply the monoburg selector on all of them)
*) the SSA optimizer will see the original opcodes and will be able to use
the semantic info associated with them
The disadvantage is that this is a separate pass on the code and
it takes time (how much has not been measured yet, though).
With this approach, we may also be able to have C implementations
of some of the opcodes: this pass would insert a function call to
the C implementation (for example in the cases when first porting
to a new arch and implemenating some stuff may be too hard in asm).
* Extended basic blocks
IL code needs a lot of checks, bounds checks, overflow checks,
type checks and so on. This potentially increases by a lot
the number of basic blocks in a control flow graph. However,
all such blocks end up with a throw opcode that gives control to the
exception handling mechanism.
After method_to_ir () a MonoBasicBlock can be considered a sort
of extended basic block where the additional exits don't point
to basic blocks in the same procedure (at least when the method
doesn't have exception tables).
We need to make sure the passes following method_to_ir () can cope
with such kinds of extended basic blocks (especially the passes
that we need to apply to all the methods: as a start, we could
skip SSA optimizations for methods with exception clauses...)

View File

@@ -292,7 +292,6 @@ mono_free_method
mono_free_verify_list
mono_gc_collect
mono_gc_collection_count
mono_gc_enable_events
mono_gc_get_generation
mono_gc_get_heap_size
mono_gc_get_used_size
@@ -301,9 +300,7 @@ mono_gchandle_get_target
mono_gchandle_new
mono_gchandle_new_weakref
mono_gc_invoke_finalizers
mono_gc_is_finalizer_thread
mono_gc_max_generation
mono_gc_out_of_memory
mono_gc_wbarrier_arrayref_copy
mono_gc_wbarrier_generic_nostore
mono_gc_wbarrier_generic_store
@@ -604,7 +601,6 @@ mono_object_get_class
mono_object_get_domain
mono_object_get_size
mono_object_get_virtual_method
mono_object_is_alive
mono_object_isinst
mono_object_isinst_mbyref
mono_object_new

View File

@@ -1,16 +0,0 @@
<h1>Mono 1.0 Release Notes</h1>
<h2>What does Mono Include</h2>
<h2>Missing functionality</h2>
<p>COM support.
<p>EnterpriseServices are non-existant.
<p>Windows.Forms is only available as a preview, it is not
completed nor stable.
<h3>Assembly: System.Drawing</h3>
<p>System.Drawing.Printing is not supported.

View File

@@ -93,12 +93,9 @@
<h4><a name="api:mono_gc_disable">mono_gc_disable</a></h4>
<h4><a name="api:mono_gc_enable">mono_gc_enable</a></h4>
<h4><a name="api:mono_gc_is_finalizer_thread">mono_gc_is_finalizer_thread</a></h4>
<h4><a name="api:mono_gc_out_of_memory">mono_gc_out_of_memory</a></h4>
<h4><a name="api:mono_gc_start_world">mono_gc_start_world</a></h4>
<h4><a name="api:mono_gc_stop_world">mono_gc_stop_world</a></h4>
<h4><a name="api:mono_gc_alloc_fixed">mono_gc_alloc_fixed</a></h4>
<h4><a name="api:mono_gc_enable_events">mono_gc_enable_events</a></h4>
<h4><a name="api:mono_gc_free_fixed">mono_gc_free_fixed</a></h4>
<h4><a name="api:mono_gc_make_descr_from_bitmap">mono_gc_make_descr_from_bitmap</a></h4>

View File

@@ -93,7 +93,6 @@ result = mono_object_new (mono_domain_get (), version_class);
<h4><a name="api:mono_object_isinst">mono_object_isinst</a></h4>
<h4><a name="api:mono_object_unbox">mono_object_unbox</a></h4>
<h4><a name="api:mono_object_castclass_mbyref">mono_object_castclass_mbyref</a></h4>
<h4><a name="api:mono_object_is_alive">mono_object_is_alive</a></h4>
<h4><a name="api:mono_object_get_size">mono_object_get_size</a></h4>
<a name="valuetypes"></a>

View File

@@ -1,33 +0,0 @@
Size and alignment requirements of stack values
===============================================
P ... System.IntPtr
I1 ... System.Int8
I2 ... System.Int16
I4 ... System.Int32
I8 ... System.Int64
F ... System.Single
D ... System.Double
LD ... native long double
-----------------------------------------------------------
ARCH | P | I1 | I2 | I4 | I8 | F | D | LD |
-----------------------------------------------------------
X86 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
-----------------------------------------------------------
X86/W32 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
-----------------------------------------------------------
ARM | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 | 8/4 |
-----------------------------------------------------------
M68K | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 |
-----------------------------------------------------------
ALPHA | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 |
-----------------------------------------------------------
SPARC | 4/4 | 4/4 | 4/4 | 4/4 | 8/8 | 4/4 | 8/8 |16/8 |
-----------------------------------------------------------
SPARC64 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 |16/16|
-----------------------------------------------------------
MIPS | 4/4 | 4/4 | 4/4 | 4/4 | ?/? | 4/4 | 8/8 | 8/8 |
-----------------------------------------------------------
| | | | | | | | |
-----------------------------------------------------------

View File

@@ -1,261 +0,0 @@
Purpose
Especially when inlining is active, it can happen that temporary
variables add pressure to the register allocator, producing bad
code.
The idea is that some of these temporaries can be totally eliminated
my moving the MonoInst tree that defines them directly to the use
point in the code (so the name "tree mover").
Please note that this is *not* an optimization: it is mostly a
workaround to issues we have in the regalloc.
Actually, with the new linear IR this will not be possible at all
(there will be no more trees in the code!).
Anyway, this workaround turns out to be useful in the current state
of things...
-----------------------------------------------------------------------
Base logic
If a local is defined by a value which is a proper expression (a tree
of MonoInst, not just another local or a constant), and this definition
is used only once, the tree can be moved directly to the use location,
and the definition eliminated.
Of course, none of the variables used in the tree must be defined in
the code path between the definition and the use, and the tree must be
free of side effects.
We do not handle the cases when the tree is just a local or a constant
because they are handled by copyprop and consprop, respectively.
To make things simpler, we restrict the tree move to the case when:
- the definition and the use are in the same BB, and
- the use is followed by another definition in the same BB (it is not
possible that the 1st value is used again), or alternatively there
is no BB in the whole CFG that contains a use of this local before a
definition (so, again, there is no code path that can lead to a
subsequent use).
To handle this, we maintain an ACT array (Available Copy Tree, similar
to the ACP), where we store the "state" of every local.
Ideally, every local can be in the following state:
[E] Undefined (by a tree, it could be in the ACP but we don't care).
[D] Defined (by a tree), and waiting for a use.
[U] Used, with a tree definition available in the same BB, but still
without a definition following the use (always in the same BB).
Of course state [E] (empty) is the initial one.
Besides, there are two sort of "meta states", or flags:
[W] Still waiting for a use or definition in this BB (we have seen no
occurrence of the local yet).
[X] Used without being previously defined in the same BB (note that if
there is a definition that precedes the use in the same BB, even if
the definition is not a tree or is not available because of side
effects or because the tree value has changed the local is not in
state [X]).
Also note that state [X] is a sort of "global" condition, which if set
in one BB will stay valid for the whole CFG, even if the local will
otherwise change state. The idea of flagging a local as [X] is that if
there is a definition/use pair that reaches the end of a BB, it could
be that there is a CFG path that then leads to the BB flagging it as
[X] (which contains a use), so the tree cannot be moved.
So state [X] will always be set, and never examined in all the state
transitions we will describe.
In practice, we use flag [W] to set state [X]: if, when traversing a
BB, we find a use for a local in state [W], then that local is flagged
[X].
For each BB, we initialize all states to [E] and [W], and then we
traverse the code one inst at a time, and update the variable states
in the ACT in the following ways:
[Definition]
- Flag [W] is cleared.
- All "affected trees" are killed (go from state [D] to [E]).
The "affected trees" are the trees which contain (use) the defined
local, and the rationale is that the tree value changed, so the
tree is no longer available.
- If the local was in state [U], *that* tree move is marked "safe"
(because *this* definition makes us sure that the previous tree
cannot be used again in any way).
The idea is that "safe" moves can happen even if the local is
flagged [X], because the second definition "covers" the use.
The tree move is then saved in the "todo" list (and the affecting
nodes are cleared).
- If the local was defined by a tree, it goes to state [D], the tree
is recorded, and all the locals used in it are marked as "affecting
this tree" (of course these markers are lists, because each local
could affect more than one tree).
[IndirectDefinition]
- All potentially affected trees (in state [D]) are killed.
[Use]
- If the local is still [W], it is flagged [X] (the [W] goes away).
- If the local is in state [D], it goes to state [U].
The tree move must not yet be recorded in the "todo" list, it still
stays in the ACT slot belonging to this local.
Anyway, the "affecting" nodes are updated, because now a definition
of a local used in this tree will affect only "indirect" (or also
"propagated") moves, but not *this* move (see below).
- If the local is in state [U], then the tree cannot be moved (it is
used two times): the move is canceled, and the state goes [E].
- If the local is in state [E], the use is ignored.
[IndirectUse]
- All potentially affected trees (in state [D] or [U]) are killed.
[SideEffect]
- Tree is marked as "unmovable".
Then, at the end of the BB, for each ACT slot:
- If state is [U], the tree move is recorded in the "todo" list, but
flagged "unsafe".
- Anyway, state goes to [E], the [W] flag is set, and all "affecting"
lists are cleared (we get ready to traverse the next BB).
Finally, when all BBs has been scanned, we traverse the "todo" list,
moving all "safe" entries, and moving "unsafe" ones only if their ACT
slot is not flagged [X].
So far, so good.
But there are two issues that make things harder :-(
The first is the concept of "indirect tree move".
It can happen that a tree is scheduled for moving, and its destination
is a use that is located in a second tree, which could also be moved.
The main issue is that a definition of a variable of the 1st tree on
the path between the definition and the use of the 2nd one must prevent
the move.
But which move? The 1st or the 2nd?
Well, any of the two!
The point is, the 2nd move must be prevented *only* if the 1st one
happens: if it is aborted (for an [X] flag or any other reason), the
2nd move is OK, and vice versa...
We must handle this in the following way:
- The ACT must still remember if a slot is scheduled for moving in
this BB, and if it is, all the locals used in the tree.
We say that the slot is in state [M].
Note that [M] is (like [X] and [W]) a sort of "meta state": a local
is flagged [M] when it goes to state [U], and the flag is cleared
when the tree move is cancelled
- A tree that uses a local whose slot is in state [M] is also using all
the locals used by the tree in state [M], but the use is "indirect".
These use nodes are also included in the "affecting" lists.
- The definition of a variable used in an "indirect" way has the
effect of "linking" the two involved tree moves, saying that only one
of the two can happen in practice, but not both.
- When the 2nd tree is scheduled for moving, the 1st one is *still* in
state [M], because a third move could "carry it forward", and all the
*three* moves should be mutually exclusive (to be safe!).
The second tricky complication is the "tree forwarding" that can happen
when copyprop is involved.
It is conceptually similar to the "indirect tree move".
Only, the 2nd tree is not really a tree, it is just the local defined
in the 1st tree move.
It can happen that copyprop will propagate the definition.
We cannot make treeprop do the same job of copyprop, because copyprop
has less constraints, and is therefore more powerful in its scope.
The main issue is that treeprop cannot propagate a tree to *two* uses,
while copyprop is perfectly capable of propagating one definition to
two (or more) different places.
So we must let copyprop do its job otherwise we'll miss optimizations,
but we must also make it play safe with treeprop.
Let's clarify with an example:
a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
b = a; //a is used, state [U] with move scheduled, and
//b is defined by a, ACP[b] is a, and b is in state [DC]
c = b + v3; // b is used, goes to state [U]
The real trouble is that copyprop happens *immediately*, while treeprop
is deferred to the end of the CFG traversal.
So, in the 3rd statement, the "b" is immediately turned into an "a" by
copyprop, regardless of what treeprop will do.
Anyway, if we are careful, this is not so bad.
First of all, we must "accept" the fact that in the 3rd statement the
"b" is in fact an "a", as treeprop must happen *after* copyprop.
The real problem is that "a" is used twice: in the 2nd and 3rd lines.
In our usual setup, the 2nd line would set it to [U], and the 3rd line
would kill the move (and set "a" to [E]).
I have tried to play tricks, and reason as of copyprop didn't happen,
but everything becomes really messy.
Instead, we should note that the 2nd line is very likely to be dead.
At least in this BB, copyprop will turn all "b"s into "a"s as long as
it can, and when it cannot, it will be because either "a" or "b" have
been redefined, which would be after the tree move anyway.
So, the reasoning gets different: let's pretend that "b" will be dead.
This will make the "a" use in the 2nd statement useless, so there we
can "reset" "a" to [D], but also take note that if "b" will end up
not being dead, the tree move associated to this [D] must be aborted.
We can detect this in the following way:
- Either "b" is used before being defined in this BB, or
- It will be flagged "unsafe".
Both things are very easy to check.
The only quirk is that the "affecting" lists must not be cleared when
a slot goes to state [U], because a "propagation" could put it back
to state [D] (where those lists are needed, because it can be killed
by a definition to a used slot).
-----------------------------------------------------------------------
Implementation notes
All the implementation runs inside the existing mono_local_cprop
function, and a separate memory pool is used to hold the temporary
data.
A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
the list of scheduled moves and auxiliary things.
This struct is allocated if the tree move pass is requested, and is
then passed along to all the involved functions, which are therefore
aware of the tree mover state.
The ACT is an array of slots, obviously one per local.
Each slot is of type MonoTreeMoverActSlot, and contains the used and
affected locals, a pointer to the pending tree move and the "waiting"
and "unsafe" flags.
The "affecting" lists a built from "dependency nodes", of type
MonoTreeMoverDependencyNode.
Each of the nodes contains the used and affected local, and is in
two lists: the locals used by a slot, and the locals affected by a
slot (obviously a different one).
So, each node means: "variable x is used in tree t, so a definition
of x affects tree t".
The "affecting" lists are doubly linked, to allow for O(1) deletion.
The "used" lists are simply linked, but when they are mantained there
is always a pointer to the last element to allow for O(1) list moving.
When a used list is dismissed (which happens often, any time a node is
killed), its nodes are unlinked from their respective affecting lists
and are then put in a "free" list in the MonoTreeMover to be reused.
Each tree move is represented by a struct (MonoTreeMoverTreeMove),
which contains:
- the definition and use points,
- the "affected" moves (recall the concept of "indirect tree move"),
- the "must be dead" slots (recall "tree forwarding"). and
- a few utility flags.
The tree moves stays in the relevant ACT slot until it is ready to be
scheduled for moving, at which point it is put in a list in the
MonoTreeMover.
The tree moves structs are reused when they are killed, so there is
also a "free" list for them in the MonoTreeMover.
The tree mover code has been added to all the relevant functions that
participate in consprop and copyprop, particularly:
- mono_cprop_copy_values takes care of variable uses (transitions from
states [D] to [U] and [U] to [E] because of killing),
- mono_cprop_invalidate_values takes care of side effects (indirect
accesses, calls...),
- mono_local_cprop_bb sets up and cleans the traversals for each BB,
and for each MonoInst it takes care of variable definitions.
To each of them has been added a MonoTreeMover parameter, which is not
NULL if the tree mover is running.
After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
the list of all the pending moves, which must be walked to actually
perform the moves (when possible, because "unsafe" flags, "affected"
moves and "must be dead" slots can still have their effects, which
must be handled now because they are fully known only at the end of
the CFG traversal).