a575963da9
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
125 lines
3.9 KiB
Plaintext
125 lines
3.9 KiB
Plaintext
Just some thoughts for the JITer:
|
|
|
|
General issues:
|
|
===============
|
|
|
|
We are designing a JIT compiler, so we have to consider two things:
|
|
|
|
- the quality of the generated code
|
|
- the time needed to generate that code
|
|
|
|
The current approach is to keep the JITer as simple as possible, and thus as
|
|
fast as possible. The generated code quality will suffer from that.
|
|
|
|
Register Allocation:
|
|
====================
|
|
|
|
With lcc you can assign a fixed register to a tree before register
|
|
allocation. For example this is needed by call, which return the value always
|
|
in EAX on x86. The current implementation works without such system, due to
|
|
special forest generation.
|
|
|
|
Different Register Sets:
|
|
========================
|
|
|
|
Most processors have more that one register set, at least one for floating
|
|
point values, and one for integers. Should we support architectures with more
|
|
that two sets? Does someone knows such an architecture?
|
|
|
|
64bit Integer Values:
|
|
=====================
|
|
|
|
I can imagine two different implementation. On possibility would be to treat
|
|
long (64bit) values simply like any other value type. This implies that we
|
|
call class methods for ALU operations like add or sub. Sure, this method will
|
|
be be a bit inefficient.
|
|
|
|
The more performant solution is to allocate two 32bit registers for each 64bit
|
|
value. We add a new non terminal to the monoburg grammar called "lreg". The
|
|
register allocation routines takes care of this non terminal and allocates two
|
|
32 bit registers for them.
|
|
|
|
Forest generation:
|
|
==================
|
|
|
|
Consider the following code:
|
|
|
|
OPCODE: STACK LOCALS
|
|
LDLOC.0 (5) [5,0]
|
|
LDC.1 (5,1) [5,0]
|
|
STLOC.0 (5) [1,0]
|
|
STLOC.1 () [1,5]
|
|
|
|
A simple forest generation generates:
|
|
|
|
STLOC.0(LDC.1)
|
|
STLOC.1(LDLOC.0)
|
|
|
|
Which is wrong, since it stores the wrong value (1 instead of 5). Instead we
|
|
must generate something like:
|
|
|
|
STLOC.TMP(LDLOC.0)
|
|
STLOC.0(LDC.1)
|
|
STLOC.1(LDLOC.TMP)
|
|
|
|
Where STLOC.TMP saves the value into a new temporary variable.
|
|
|
|
We also need a similar solution for basic block boundaries when the stack depth
|
|
is not zero. We can simply save those values to new temporary values. Consider
|
|
the following basic block with one instruction:
|
|
|
|
LDLOC.1
|
|
This should generate a tree like:
|
|
|
|
STLOC.TMP(LDLOC.1) Please notice that an intelligent register allocator can
|
|
still allocate registers for those new variables.
|
|
|
|
DAG handling:
|
|
=============
|
|
|
|
Monoburg can't handle DAGs, instead we need real trees as input for
|
|
the code generator. So we have two problems:
|
|
|
|
1.) DUP instruction: This one is obvious - we need to store the value
|
|
into a temporary variable to solve the problem.
|
|
|
|
2.) function calls: Chapter 12.8, page 343 of "A retargetable C compiler"
|
|
explains that: "because listing a call node will give it a hidden reference
|
|
from the code list". I don't understand that (can someone explain that?), but
|
|
there is another reason to save return values to temporaries: Consider the
|
|
following code:
|
|
|
|
x = f(y) + g(z); // all functions return integers
|
|
|
|
We could generate such a tree for this expression: STLOC(ADD(CALL,CALL))
|
|
|
|
The problem is that both calls returns the value in the same register,
|
|
so it is non trivial to generate code for that tree. We must copy one
|
|
register into another one, which make register allocation more complex.
|
|
The easier solution is store the result of function calls to
|
|
temporaries. This leads to the following forest:
|
|
|
|
STLOC(CALL)
|
|
STLOC(CALL)
|
|
STLOC(ADD (LDLOC, LDLOC))
|
|
|
|
This is what lcc is doing, if I understood 12.8, page 342, 343?
|
|
|
|
Possible Optimisations:
|
|
=======================
|
|
|
|
Miguel said ORP does some optimisation on IL level, for example moving array
|
|
bounds checking out of loops:
|
|
|
|
for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
|
|
|
|
id transformed to:
|
|
|
|
if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; }
|
|
else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
|
|
|
|
The "else" is only to keep original semantics (exception handling).
|
|
|
|
We need loop detection logic in order to implement this (dominator tree).
|
|
|
|
AFAIK CACAO also implements this. |