a575963da9
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
114 lines
5.2 KiB
Plaintext
114 lines
5.2 KiB
Plaintext
|
|
* How to handle complex IL opcodes in an arch-independent way
|
|
|
|
Many IL opcodes are very simple: add, ldind etc.
|
|
Such opcodes can be implemented with a single cpu instruction
|
|
in most architectures (on some, a group of IL instructions
|
|
can be converted to a single cpu op).
|
|
There are many IL opcodes, though, that are more complex, but
|
|
can be expressed as a series of trees or a single tree of
|
|
simple operations. Such simple operations are architecture-independent.
|
|
It makes sense to decompose such complex IL instructions in their
|
|
simpler equivalent so that we gain in several ways:
|
|
*) porting effort is easier, because only the simple instructions
|
|
need to be implemented in arch-specific code
|
|
*) we could apply BURG rules to the trees and do pattern matching
|
|
on them to optimize the expressions according to the host cpu
|
|
|
|
The issue is: where do we do such conversion from coarse opcodes to
|
|
simple expressions?
|
|
|
|
* Doing the conversion in method_to_ir ()
|
|
|
|
Some of these conversions can certainly be done in method_to_ir (),
|
|
but it's not always easy to decide which are better done there and
|
|
which in a different pass.
|
|
For example, let's take ldlen: in the mono implementation, ldlen
|
|
can be simply implemented with a load from a fixed position in the
|
|
array object:
|
|
|
|
len = [reg + maxlen_offset]
|
|
|
|
However, ldlen carries also semantics information: the result is the
|
|
length of the array, and since in the CLR arrays are of fixed size,
|
|
this information can be useful to later do bounds check removal.
|
|
If we convert this opcode in method_to_ir () we lost some useful
|
|
information for further optimizations.
|
|
|
|
In some other ways, decomposing an opcode in method_to_ir() may
|
|
allow for better optimizations later on (need to come up with an
|
|
example here ...).
|
|
|
|
* Doing the conversion in inssel.brg
|
|
|
|
Some conversion may be done inside the burg rules: this has the
|
|
disadvantage that the instruction selector is not run again on
|
|
the resulting expression tree and we could miss some optimization
|
|
(this is what effectively happens with the coarse opcodes in the old
|
|
jit). This may also interfere with an efficient local register allocator.
|
|
It may be possible to add an extension in monoburg that allows a rule
|
|
such as:
|
|
|
|
recheck: LDLEN (reg) {
|
|
create an expression tree representing LDLEN
|
|
and return it
|
|
}
|
|
|
|
When the monoburg label process gets back a recheck, it will run
|
|
the labeling again on the resulting expression tree.
|
|
If this is possible at all (and in an efficient way) is a
|
|
question for dietmar:-)
|
|
It should be noted, though, that this may not always work, since
|
|
some complex IL opcodes may require a series of expression trees
|
|
and handling such cases in monoburg could become quite hairy.
|
|
For example, think of opcode that need to do multiple actions on the
|
|
same object: this basically means a DUP...
|
|
On the other end, if a complex opcode needs a DUP, monoburg doesn't
|
|
actually need to create trees if it emits the instructions in
|
|
the correct sequence and maintains the right values in the registers
|
|
(usually the values that need a DUP are not changed...). How
|
|
this integrates with the current register allocator is not clear, since
|
|
that assigns registers based on the rule, but the instructions emitted
|
|
by the rules may be different (this already happens with the current JIT
|
|
where a MULT is replaced with lea etc...).
|
|
|
|
* Doing it in a separate pass.
|
|
|
|
Doing the conversion in a separate pass over the instructions
|
|
is another alternative. This can be done right after method_to_ir ()
|
|
or after the SSA pass (since the IR after the SSA pass should look
|
|
almost like the IR we get back from method_to_ir ()).
|
|
|
|
This has the following advantages:
|
|
*) monoburg will handle only the simple opcodes (makes porting easier)
|
|
*) the instruction selection will be run on all the additional trees
|
|
*) it's easier to support coarse opcodes that produce multiple expression
|
|
trees (and apply the monoburg selector on all of them)
|
|
*) the SSA optimizer will see the original opcodes and will be able to use
|
|
the semantic info associated with them
|
|
|
|
The disadvantage is that this is a separate pass on the code and
|
|
it takes time (how much has not been measured yet, though).
|
|
|
|
With this approach, we may also be able to have C implementations
|
|
of some of the opcodes: this pass would insert a function call to
|
|
the C implementation (for example in the cases when first porting
|
|
to a new arch and implemenating some stuff may be too hard in asm).
|
|
|
|
* Extended basic blocks
|
|
|
|
IL code needs a lot of checks, bounds checks, overflow checks,
|
|
type checks and so on. This potentially increases by a lot
|
|
the number of basic blocks in a control flow graph. However,
|
|
all such blocks end up with a throw opcode that gives control to the
|
|
exception handling mechanism.
|
|
After method_to_ir () a MonoBasicBlock can be considered a sort
|
|
of extended basic block where the additional exits don't point
|
|
to basic blocks in the same procedure (at least when the method
|
|
doesn't have exception tables).
|
|
We need to make sure the passes following method_to_ir () can cope
|
|
with such kinds of extended basic blocks (especially the passes
|
|
that we need to apply to all the methods: as a start, we could
|
|
skip SSA optimizations for methods with exception clauses...)
|
|
|