linux-packaging-mono/docs/new-regalloc
Jo Shields a575963da9 Imported Upstream version 3.6.0
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
2014-08-13 10:39:27 +01:00

69 lines
3.7 KiB
Plaintext

We need to switch to a new register allocator.
The current one is split in a global and a local register allocator.
The global one can assign only callee-saves registers and happens
on the tree-based internal representation: it assigns local variables
to hardware registers.
The local one happens on the linear representation on a per basic
block basis and assigns hard registers to virtual registers (which
hold temporary values during expression executions) and it deals also
with the platform-specific issues (fixed registers, call conventions).
Moving to a different register will help solve some of the performance
issues introduced by the above split, make the register more easily
portable and solve some of the issues generated by dealing with trees.
The general design ideas are below.
The new allocator should have a global view of all the method, so it can be
able to assign variables also to some of the volatile registers if possible,
even across basic blocks (this would improve performance).
The allocator would be driven by per-arch declarative data, so porting
should be easier: an architecture needs to specify register classes,
call convention and instructions requirements (similar to the gcc code).
The allocator should operate on the linear representation, this way it's
easier and faster to track usages more correctly. We need to assign virtual
registers on a per-method basis instead of per basic block. We can assign
virtual registers to variables, too. Note that since we fix the stack offset
of local vars only after this step (which happens after the burg rules are run),
some of the burg rules that try to optimize the code won't apply anymore:
the peephole code may need to be enhanced to do the optimizations instead.
We need to handle floating point registers in the global allocator, too.
The new allocator also needs to keep track precisely of which registers
contain references or managed pointers to allow us to move to a precise GC.
It may be worth to use a single increasing set of integers for the virtual
registers, with the class of the register stored separately (unless the
current local allocator which keeps interger and fp registers separate).
Since this is a large task, we need to do it in steps as much as possible.
The first is to run the register allocator _after_ the burg rules: this
requires a rewrite of the liveness code, too, to use linear indexes instead
of basic-block/tree number combinations. This can be done by:
*) allocating virtual regs to all the locals that can be register allocated
*) running the burg rules (some may require adjustments): the local virtual
registers are assigned starting from global-virt-regs+1, instead of the current
hardware-regs+1, so we can tell apart global and local virt regs.
*) running the liveness/whatever code is needed to allocate the global registers
*) allocate the rest of the local variables to stack slots
*) continue with the current local allocator
This work could take 2-3 weeks.
The next step is to define the kind of declarative data an architecture needs
and assigning virtual regs to all the registers and making the allocator
assign from the volatile registers, too.
Note that some of the code that is currently emitted in the arch-specific
code, will need to be emitted as instructions that the reg allocator
can inspect: think of a method that returns the first argument which is
received in a register: the current code copies it to either a local slot or
to a global reg in the prolog an copies it back to the return register
int he basic block, but since neither the regallocator nor the peephole code
knows about the prolog code, the first store cannot be optimized away.
The gcc code has some example of how to specify register classes in a
declarative way.