262 lines
13 KiB
Plaintext
262 lines
13 KiB
Plaintext
|
|
||
|
Purpose
|
||
|
|
||
|
Especially when inlining is active, it can happen that temporary
|
||
|
variables add pressure to the register allocator, producing bad
|
||
|
code.
|
||
|
|
||
|
The idea is that some of these temporaries can be totally eliminated
|
||
|
my moving the MonoInst tree that defines them directly to the use
|
||
|
point in the code (so the name "tree mover").
|
||
|
|
||
|
Please note that this is *not* an optimization: it is mostly a
|
||
|
workaround to issues we have in the regalloc.
|
||
|
Actually, with the new linear IR this will not be possible at all
|
||
|
(there will be no more trees in the code!).
|
||
|
Anyway, this workaround turns out to be useful in the current state
|
||
|
of things...
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
Base logic
|
||
|
|
||
|
If a local is defined by a value which is a proper expression (a tree
|
||
|
of MonoInst, not just another local or a constant), and this definition
|
||
|
is used only once, the tree can be moved directly to the use location,
|
||
|
and the definition eliminated.
|
||
|
Of course, none of the variables used in the tree must be defined in
|
||
|
the code path between the definition and the use, and the tree must be
|
||
|
free of side effects.
|
||
|
We do not handle the cases when the tree is just a local or a constant
|
||
|
because they are handled by copyprop and consprop, respectively.
|
||
|
|
||
|
To make things simpler, we restrict the tree move to the case when:
|
||
|
- the definition and the use are in the same BB, and
|
||
|
- the use is followed by another definition in the same BB (it is not
|
||
|
possible that the 1st value is used again), or alternatively there
|
||
|
is no BB in the whole CFG that contains a use of this local before a
|
||
|
definition (so, again, there is no code path that can lead to a
|
||
|
subsequent use).
|
||
|
|
||
|
To handle this, we maintain an ACT array (Available Copy Tree, similar
|
||
|
to the ACP), where we store the "state" of every local.
|
||
|
Ideally, every local can be in the following state:
|
||
|
[E] Undefined (by a tree, it could be in the ACP but we don't care).
|
||
|
[D] Defined (by a tree), and waiting for a use.
|
||
|
[U] Used, with a tree definition available in the same BB, but still
|
||
|
without a definition following the use (always in the same BB).
|
||
|
Of course state [E] (empty) is the initial one.
|
||
|
|
||
|
Besides, there are two sort of "meta states", or flags:
|
||
|
[W] Still waiting for a use or definition in this BB (we have seen no
|
||
|
occurrence of the local yet).
|
||
|
[X] Used without being previously defined in the same BB (note that if
|
||
|
there is a definition that precedes the use in the same BB, even if
|
||
|
the definition is not a tree or is not available because of side
|
||
|
effects or because the tree value has changed the local is not in
|
||
|
state [X]).
|
||
|
Also note that state [X] is a sort of "global" condition, which if set
|
||
|
in one BB will stay valid for the whole CFG, even if the local will
|
||
|
otherwise change state. The idea of flagging a local as [X] is that if
|
||
|
there is a definition/use pair that reaches the end of a BB, it could
|
||
|
be that there is a CFG path that then leads to the BB flagging it as
|
||
|
[X] (which contains a use), so the tree cannot be moved.
|
||
|
So state [X] will always be set, and never examined in all the state
|
||
|
transitions we will describe.
|
||
|
In practice, we use flag [W] to set state [X]: if, when traversing a
|
||
|
BB, we find a use for a local in state [W], then that local is flagged
|
||
|
[X].
|
||
|
|
||
|
|
||
|
For each BB, we initialize all states to [E] and [W], and then we
|
||
|
traverse the code one inst at a time, and update the variable states
|
||
|
in the ACT in the following ways:
|
||
|
|
||
|
[Definition]
|
||
|
- Flag [W] is cleared.
|
||
|
- All "affected trees" are killed (go from state [D] to [E]).
|
||
|
The "affected trees" are the trees which contain (use) the defined
|
||
|
local, and the rationale is that the tree value changed, so the
|
||
|
tree is no longer available.
|
||
|
- If the local was in state [U], *that* tree move is marked "safe"
|
||
|
(because *this* definition makes us sure that the previous tree
|
||
|
cannot be used again in any way).
|
||
|
The idea is that "safe" moves can happen even if the local is
|
||
|
flagged [X], because the second definition "covers" the use.
|
||
|
The tree move is then saved in the "todo" list (and the affecting
|
||
|
nodes are cleared).
|
||
|
- If the local was defined by a tree, it goes to state [D], the tree
|
||
|
is recorded, and all the locals used in it are marked as "affecting
|
||
|
this tree" (of course these markers are lists, because each local
|
||
|
could affect more than one tree).
|
||
|
|
||
|
[IndirectDefinition]
|
||
|
- All potentially affected trees (in state [D]) are killed.
|
||
|
|
||
|
[Use]
|
||
|
- If the local is still [W], it is flagged [X] (the [W] goes away).
|
||
|
- If the local is in state [D], it goes to state [U].
|
||
|
The tree move must not yet be recorded in the "todo" list, it still
|
||
|
stays in the ACT slot belonging to this local.
|
||
|
Anyway, the "affecting" nodes are updated, because now a definition
|
||
|
of a local used in this tree will affect only "indirect" (or also
|
||
|
"propagated") moves, but not *this* move (see below).
|
||
|
- If the local is in state [U], then the tree cannot be moved (it is
|
||
|
used two times): the move is canceled, and the state goes [E].
|
||
|
- If the local is in state [E], the use is ignored.
|
||
|
|
||
|
[IndirectUse]
|
||
|
- All potentially affected trees (in state [D] or [U]) are killed.
|
||
|
|
||
|
[SideEffect]
|
||
|
- Tree is marked as "unmovable".
|
||
|
|
||
|
Then, at the end of the BB, for each ACT slot:
|
||
|
- If state is [U], the tree move is recorded in the "todo" list, but
|
||
|
flagged "unsafe".
|
||
|
- Anyway, state goes to [E], the [W] flag is set, and all "affecting"
|
||
|
lists are cleared (we get ready to traverse the next BB).
|
||
|
Finally, when all BBs has been scanned, we traverse the "todo" list,
|
||
|
moving all "safe" entries, and moving "unsafe" ones only if their ACT
|
||
|
slot is not flagged [X].
|
||
|
|
||
|
So far, so good.
|
||
|
But there are two issues that make things harder :-(
|
||
|
|
||
|
The first is the concept of "indirect tree move".
|
||
|
It can happen that a tree is scheduled for moving, and its destination
|
||
|
is a use that is located in a second tree, which could also be moved.
|
||
|
The main issue is that a definition of a variable of the 1st tree on
|
||
|
the path between the definition and the use of the 2nd one must prevent
|
||
|
the move.
|
||
|
But which move? The 1st or the 2nd?
|
||
|
Well, any of the two!
|
||
|
The point is, the 2nd move must be prevented *only* if the 1st one
|
||
|
happens: if it is aborted (for an [X] flag or any other reason), the
|
||
|
2nd move is OK, and vice versa...
|
||
|
We must handle this in the following way:
|
||
|
- The ACT must still remember if a slot is scheduled for moving in
|
||
|
this BB, and if it is, all the locals used in the tree.
|
||
|
We say that the slot is in state [M].
|
||
|
Note that [M] is (like [X] and [W]) a sort of "meta state": a local
|
||
|
is flagged [M] when it goes to state [U], and the flag is cleared
|
||
|
when the tree move is cancelled
|
||
|
- A tree that uses a local whose slot is in state [M] is also using all
|
||
|
the locals used by the tree in state [M], but the use is "indirect".
|
||
|
These use nodes are also included in the "affecting" lists.
|
||
|
- The definition of a variable used in an "indirect" way has the
|
||
|
effect of "linking" the two involved tree moves, saying that only one
|
||
|
of the two can happen in practice, but not both.
|
||
|
- When the 2nd tree is scheduled for moving, the 1st one is *still* in
|
||
|
state [M], because a third move could "carry it forward", and all the
|
||
|
*three* moves should be mutually exclusive (to be safe!).
|
||
|
|
||
|
The second tricky complication is the "tree forwarding" that can happen
|
||
|
when copyprop is involved.
|
||
|
It is conceptually similar to the "indirect tree move".
|
||
|
Only, the 2nd tree is not really a tree, it is just the local defined
|
||
|
in the 1st tree move.
|
||
|
It can happen that copyprop will propagate the definition.
|
||
|
We cannot make treeprop do the same job of copyprop, because copyprop
|
||
|
has less constraints, and is therefore more powerful in its scope.
|
||
|
The main issue is that treeprop cannot propagate a tree to *two* uses,
|
||
|
while copyprop is perfectly capable of propagating one definition to
|
||
|
two (or more) different places.
|
||
|
So we must let copyprop do its job otherwise we'll miss optimizations,
|
||
|
but we must also make it play safe with treeprop.
|
||
|
Let's clarify with an example:
|
||
|
a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
|
||
|
b = a; //a is used, state [U] with move scheduled, and
|
||
|
//b is defined by a, ACP[b] is a, and b is in state [DC]
|
||
|
c = b + v3; // b is used, goes to state [U]
|
||
|
The real trouble is that copyprop happens *immediately*, while treeprop
|
||
|
is deferred to the end of the CFG traversal.
|
||
|
So, in the 3rd statement, the "b" is immediately turned into an "a" by
|
||
|
copyprop, regardless of what treeprop will do.
|
||
|
Anyway, if we are careful, this is not so bad.
|
||
|
First of all, we must "accept" the fact that in the 3rd statement the
|
||
|
"b" is in fact an "a", as treeprop must happen *after* copyprop.
|
||
|
The real problem is that "a" is used twice: in the 2nd and 3rd lines.
|
||
|
In our usual setup, the 2nd line would set it to [U], and the 3rd line
|
||
|
would kill the move (and set "a" to [E]).
|
||
|
I have tried to play tricks, and reason as of copyprop didn't happen,
|
||
|
but everything becomes really messy.
|
||
|
Instead, we should note that the 2nd line is very likely to be dead.
|
||
|
At least in this BB, copyprop will turn all "b"s into "a"s as long as
|
||
|
it can, and when it cannot, it will be because either "a" or "b" have
|
||
|
been redefined, which would be after the tree move anyway.
|
||
|
So, the reasoning gets different: let's pretend that "b" will be dead.
|
||
|
This will make the "a" use in the 2nd statement useless, so there we
|
||
|
can "reset" "a" to [D], but also take note that if "b" will end up
|
||
|
not being dead, the tree move associated to this [D] must be aborted.
|
||
|
We can detect this in the following way:
|
||
|
- Either "b" is used before being defined in this BB, or
|
||
|
- It will be flagged "unsafe".
|
||
|
Both things are very easy to check.
|
||
|
The only quirk is that the "affecting" lists must not be cleared when
|
||
|
a slot goes to state [U], because a "propagation" could put it back
|
||
|
to state [D] (where those lists are needed, because it can be killed
|
||
|
by a definition to a used slot).
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
Implementation notes
|
||
|
|
||
|
All the implementation runs inside the existing mono_local_cprop
|
||
|
function, and a separate memory pool is used to hold the temporary
|
||
|
data.
|
||
|
|
||
|
A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
|
||
|
the list of scheduled moves and auxiliary things.
|
||
|
This struct is allocated if the tree move pass is requested, and is
|
||
|
then passed along to all the involved functions, which are therefore
|
||
|
aware of the tree mover state.
|
||
|
|
||
|
The ACT is an array of slots, obviously one per local.
|
||
|
Each slot is of type MonoTreeMoverActSlot, and contains the used and
|
||
|
affected locals, a pointer to the pending tree move and the "waiting"
|
||
|
and "unsafe" flags.
|
||
|
|
||
|
The "affecting" lists a built from "dependency nodes", of type
|
||
|
MonoTreeMoverDependencyNode.
|
||
|
Each of the nodes contains the used and affected local, and is in
|
||
|
two lists: the locals used by a slot, and the locals affected by a
|
||
|
slot (obviously a different one).
|
||
|
So, each node means: "variable x is used in tree t, so a definition
|
||
|
of x affects tree t".
|
||
|
The "affecting" lists are doubly linked, to allow for O(1) deletion.
|
||
|
The "used" lists are simply linked, but when they are mantained there
|
||
|
is always a pointer to the last element to allow for O(1) list moving.
|
||
|
When a used list is dismissed (which happens often, any time a node is
|
||
|
killed), its nodes are unlinked from their respective affecting lists
|
||
|
and are then put in a "free" list in the MonoTreeMover to be reused.
|
||
|
|
||
|
Each tree move is represented by a struct (MonoTreeMoverTreeMove),
|
||
|
which contains:
|
||
|
- the definition and use points,
|
||
|
- the "affected" moves (recall the concept of "indirect tree move"),
|
||
|
- the "must be dead" slots (recall "tree forwarding"). and
|
||
|
- a few utility flags.
|
||
|
The tree moves stays in the relevant ACT slot until it is ready to be
|
||
|
scheduled for moving, at which point it is put in a list in the
|
||
|
MonoTreeMover.
|
||
|
The tree moves structs are reused when they are killed, so there is
|
||
|
also a "free" list for them in the MonoTreeMover.
|
||
|
|
||
|
The tree mover code has been added to all the relevant functions that
|
||
|
participate in consprop and copyprop, particularly:
|
||
|
- mono_cprop_copy_values takes care of variable uses (transitions from
|
||
|
states [D] to [U] and [U] to [E] because of killing),
|
||
|
- mono_cprop_invalidate_values takes care of side effects (indirect
|
||
|
accesses, calls...),
|
||
|
- mono_local_cprop_bb sets up and cleans the traversals for each BB,
|
||
|
and for each MonoInst it takes care of variable definitions.
|
||
|
To each of them has been added a MonoTreeMover parameter, which is not
|
||
|
NULL if the tree mover is running.
|
||
|
After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
|
||
|
the list of all the pending moves, which must be walked to actually
|
||
|
perform the moves (when possible, because "unsafe" flags, "affected"
|
||
|
moves and "must be dead" slots can still have their effects, which
|
||
|
must be handled now because they are fully known only at the end of
|
||
|
the CFG traversal).
|