140 lines
6.4 KiB
Plaintext
140 lines
6.4 KiB
Plaintext
|
|
||
|
IMT-based interface invocation support
|
||
|
|
||
|
The mono JIT can use an IMT-style invocation system to call interface methods.
|
||
|
This considerably reduces the runtime memory usage when many interface types
|
||
|
are loaded, because the old system required an array in MonoVTable indexed
|
||
|
by the interface id, which grows linearly as more interfaces are loaded.
|
||
|
In some cases there are also speedups, since an interface call can reduce to
|
||
|
a virtual call automatically.
|
||
|
|
||
|
IMT instead uses a fixed-size table and hashes each method in the implemented
|
||
|
interfaces to a slot in the IMT table. To be able to resolve collisions, at each
|
||
|
callsite we store the interface MonoMethod to be called in a well-known register and
|
||
|
the IMT table will contain a snippet of code that uses it to jump to the
|
||
|
proper vtable slot. The interface invocation sequence becomes (in pseudo-code):
|
||
|
|
||
|
mov magic_reg, interface_monomethod
|
||
|
call vtable [imt_slot]
|
||
|
|
||
|
The IMT table is stored at negative addresses in the vtable, like the old
|
||
|
interface array used to be.
|
||
|
|
||
|
A small note on the choice of magic_reg for different JIT backends: the IMT
|
||
|
method identifier doesn't necessarily need to be stored in a register, though
|
||
|
doing so is fast and the JIT code has already the infrastructure to handle this
|
||
|
case in an arch-independent way. A JIT porter just needs to #define
|
||
|
MONO_ARCH_IMT_REG to the chosen register. Note that this register should be
|
||
|
part of the MONO_ARCH_CALLEE_REGS set as it will be handled by the local register
|
||
|
allocator (see mini/inssel.brg) and it must not be part of the registers used for
|
||
|
argument passing as you'd overwrite an argument in that case.
|
||
|
Also note that the method-specific trampoline code should make sure to preserve
|
||
|
this register (but it should already if it's in MONO_ARCH_CALLEE_REGS as
|
||
|
it could have been used for a vtable indirect call).
|
||
|
|
||
|
Note that in the case of a nono-colliding IMT slot, the interface call
|
||
|
instruction sequence becomes equivalent to a virtual call, as the IMT slot
|
||
|
will contain the direct trampoline for the method and the magic trampoline will
|
||
|
set the slot to the method's native code address once it is compiled.
|
||
|
|
||
|
In case of collisions in the IMT slot, the JIT performs a linear search if
|
||
|
the colliding methods are few or a binary search otherwise.
|
||
|
To make this easier for each JIT port, a sort of internal representation
|
||
|
of the code is created: this is an array of MonoIMTCheckItem structures
|
||
|
built in a way to allow easy generation of a bsearch, when the list of colliding
|
||
|
methods becomes large.
|
||
|
|
||
|
Each item in the array represents either a direct check for a method to be invoked
|
||
|
or a bisection check in the bsearch algorithm.
|
||
|
|
||
|
struct _MonoIMTCheckItem {
|
||
|
MonoMethod *method;
|
||
|
int check_target_idx;
|
||
|
int vtable_slot;
|
||
|
guint8 *jmp_code;
|
||
|
guint8 *code_target;
|
||
|
guint8 is_equals;
|
||
|
guint8 compare_done;
|
||
|
guint8 chunk_size;
|
||
|
guint8 short_branch;
|
||
|
};
|
||
|
|
||
|
For a direct check, the is_equals value is non-zero and the emitted code
|
||
|
should be equivalent to:
|
||
|
if (magic_reg != item->method)
|
||
|
jump_to_item (array [item->check_target_idx]);
|
||
|
jump_to_vtable (item->vtable_slot);
|
||
|
|
||
|
Note that if item->check_target_idx is 0, the jump should be omitted
|
||
|
since this is the end of a linear sequence (you might want to insert a jump to
|
||
|
a breakpoint, though, for debugging) and this would mean that we have an error:
|
||
|
the IMT slot was asked to execute an interface method that the type doesn't implement.
|
||
|
In the future we might want to handle this case not with a breakpoint or assert, but
|
||
|
by either throwing an InvalidCast exception or by going into the runtime and
|
||
|
adding support for the interface automagically to the type/vtable: this could be used
|
||
|
both for transparent proxies and for the implicit interfaces that vectors in 2.0
|
||
|
provide.
|
||
|
|
||
|
For a bisect check the code is even simpler:
|
||
|
|
||
|
if (magic_reg >= item->method)
|
||
|
jump_to_item (array [item->check_target_idx]);
|
||
|
|
||
|
In this case item->check_target_idx is always non-zero.
|
||
|
Note that in both cases item->method becomes an immediate constant in the
|
||
|
jitted code.
|
||
|
|
||
|
The other fields in the structure are there to provide to the backend
|
||
|
common storage for data needed during emission.
|
||
|
As each item's code is emitted, the start of it is stored in the code_target
|
||
|
field. At the same time when a conditional branch is inserted, its address
|
||
|
is stored in jmp_code: this way with a single forward pass on the array at
|
||
|
the end of the emission phase the branches can be patched to point to the
|
||
|
proper target item's code (this process would patch the jump_to_item pseudo
|
||
|
instructions described above).
|
||
|
|
||
|
chunk_size can be used to store the size of the code generated for the item: this
|
||
|
can be used to optimize the short/long branch instructions, together with
|
||
|
info stored in short_branch. It is also used to calculate the size of the
|
||
|
code to allocate for the whole IMT thunk.
|
||
|
|
||
|
The compare_done field can be used to avoid doing an additional compare
|
||
|
in a is_equals item for the same MonoMethod that was just compared in a
|
||
|
bisecting item. Suppose we have 4 methods colliding in a slot, A, B, C and D.
|
||
|
The arch-independent code already took care of sorting them, so that:
|
||
|
A < B < C < D
|
||
|
|
||
|
The generated code will look like (M is the method to call):
|
||
|
|
||
|
compare (C, M)
|
||
|
goto upper_sequence if bigger_equals
|
||
|
/* linear sequence */
|
||
|
compare (M, A)
|
||
|
goto B_found if not_equals
|
||
|
jump to A's slot
|
||
|
B_found:
|
||
|
jump to B's slot
|
||
|
|
||
|
upper_sequence:
|
||
|
/* we just did a compare against C, no need to compare again */
|
||
|
goto D_found if not_equals
|
||
|
jump to C's slot
|
||
|
D_found:
|
||
|
jump to D's slot
|
||
|
|
||
|
This optimization is of course valid for architectures with flags registers.
|
||
|
|
||
|
As a further optimization to reduce memory usage, the Mono runtime sets the
|
||
|
IMT slots initially to a single-instance magic trampoline so there is actually no
|
||
|
memory used up by the thunks in the case of collisions. When an interface method is
|
||
|
called the magic trampoline will fill-in the IMT slot with the proper thunk or
|
||
|
trampoline, so later calls will use the fast path.
|
||
|
This single-instance trampoline will use MONO_FAKE_IMT_METHOD as the method
|
||
|
it's asking to be compiled and executed: the trampoline code does recognize
|
||
|
this special value and retrieves the interface method to call from the usual
|
||
|
MONO_ARCH_IMT_REG saved by the trampoline code.
|
||
|
Given that only the IMT slots that are actually used will be initialized, this saves
|
||
|
quite a bit of memory, as it's unlikely that all the interface methods are called on
|
||
|
all the different types.
|
||
|
|