linux-packaging-mono/docs/tree-mover.txt


Purpose

Especially when inlining is active, it can happen that temporary
variables add pressure to the register allocator, producing bad
code.

The idea is that some of these temporaries can be totally eliminated
my moving the MonoInst tree that defines them directly to the use
point in the code (so the name "tree mover").

Please note that this is *not* an optimization: it is mostly a
workaround to issues we have in the regalloc.
Actually, with the new linear IR this will not be possible at all
(there will be no more trees in the code!).
Anyway, this workaround turns out to be useful in the current state
of things...

-----------------------------------------------------------------------

Base logic

If a local is defined by a value which is a proper expression (a tree
of MonoInst, not just another local or a constant), and this definition
is used only once, the tree can be moved directly to the use location,
and the definition eliminated.
Of course, none of the variables used in the tree must be defined in
the code path between the definition and the use, and the tree must be
free of side effects.
We do not handle the cases when the tree is just a local or a constant
because they are handled by copyprop and consprop, respectively.

To make things simpler, we restrict the tree move to the case when:
- the definition and the use are in the same BB, and
- the use is followed by another definition in the same BB (it is not
  possible that the 1st value is used again), or alternatively there
  is no BB in the whole CFG that contains a use of this local before a
  definition (so, again, there is no code path that can lead to a
  subsequent use).

To handle this, we maintain an ACT array (Available Copy Tree, similar
to the ACP), where we store the "state" of every local.
Ideally, every local can be in the following state:
[E] Undefined (by a tree, it could be in the ACP but we don't care).
[D] Defined (by a tree), and waiting for a use.
[U] Used, with a tree definition available in the same BB, but still
    without a definition following the use (always in the same BB).
Of course state [E] (empty) is the initial one.

Besides, there are two sort of "meta states", or flags:
[W] Still waiting for a use or definition in this BB (we have seen no
    occurrence of the local yet).
[X] Used without being previously defined in the same BB (note that if
    there is a definition that precedes the use in the same BB, even if
    the definition is not a tree or is not available because of side
    effects or because the tree value has changed the local is not in
    state [X]).
Also note that state [X] is a sort of "global" condition, which if set
in one BB will stay valid for the whole CFG, even if the local will
otherwise change state. The idea of flagging a local as [X] is that if
there is a definition/use pair that reaches the end of a BB, it could
be that there is a CFG path that then leads to the BB flagging it as
[X] (which contains a use), so the tree cannot be moved.
So state [X] will always be set, and never examined in all the state
transitions we will describe.
In practice, we use flag [W] to set state [X]: if, when traversing a
BB, we find a use for a local in state [W], then that local is flagged
[X].


For each BB, we initialize all states to [E] and [W], and then we
traverse the code one inst at a time, and update the variable states
in the ACT in the following ways:

[Definition]
  - Flag [W] is cleared.
  - All "affected trees" are killed (go from state [D] to [E]).
    The "affected trees" are the trees which contain (use) the defined
    local, and the rationale is that the tree value changed, so the
    tree is no longer available.
  - If the local was in state [U], *that* tree move is marked "safe"
    (because *this* definition makes us sure that the previous tree
    cannot be used again in any way).
    The idea is that "safe" moves can happen even if the local is
    flagged [X], because the second definition "covers" the use.
    The tree move is then saved in the "todo" list (and the affecting
    nodes are cleared).
  - If the local was defined by a tree, it goes to state [D], the tree
    is recorded, and all the locals used in it are marked as "affecting
    this tree" (of course these markers are lists, because each local
    could affect more than one tree).

[IndirectDefinition]
  - All potentially affected trees (in state [D]) are killed.

[Use]
  - If the local is still [W], it is flagged [X] (the [W] goes away).
  - If the local is in state [D], it goes to state [U].
    The tree move must not yet be recorded in the "todo" list, it still
    stays in the ACT slot belonging to this local.
    Anyway, the "affecting" nodes are updated, because now a definition
    of a local used in this tree will affect only "indirect" (or also
    "propagated") moves, but not *this* move (see below).
  - If the local is in state [U], then the tree cannot be moved (it is
    used two times): the move is canceled, and the state goes [E].
  - If the local is in state [E], the use is ignored.

[IndirectUse]
  - All potentially affected trees (in state [D] or [U]) are killed.

[SideEffect]
  - Tree is marked as "unmovable".

Then, at the end of the BB, for each ACT slot:
  - If state is [U], the tree move is recorded in the "todo" list, but
    flagged "unsafe".
  - Anyway, state goes to [E], the [W] flag is set, and all "affecting"
    lists are cleared (we get ready to traverse the next BB).
Finally, when all BBs has been scanned, we traverse the "todo" list,
moving all "safe" entries, and moving "unsafe" ones only if their ACT
slot is not flagged [X].

So far, so good.
But there are two issues that make things harder :-(

The first is the concept of "indirect tree move".
It can happen that a tree is scheduled for moving, and its destination
is a use that is located in a second tree, which could also be moved.
The main issue is that a definition of a variable of the 1st tree on
the path between the definition and the use of the 2nd one must prevent
the move.
But which move? The 1st or the 2nd?
Well, any of the two!
The point is, the 2nd move must be prevented *only* if the 1st one
happens: if it is aborted (for an [X] flag or any other reason), the
2nd move is OK, and vice versa...
We must handle this in the following way:
- The ACT must still remember if a slot is scheduled for moving in
  this BB, and if it is, all the locals used in the tree.
  We say that the slot is in state [M].
  Note that [M] is (like [X] and [W]) a sort of "meta state": a local
  is flagged [M] when it goes to state [U], and the flag is cleared
  when the tree move is cancelled
- A tree that uses a local whose slot is in state [M] is also using all
  the locals used by the tree in state [M], but the use is "indirect".
  These use nodes are also included in the "affecting" lists.
- The definition of a variable used in an "indirect" way has the
  effect of "linking" the two involved tree moves, saying that only one
  of the two can happen in practice, but not both.
- When the 2nd tree is scheduled for moving, the 1st one is *still* in
  state [M], because a third move could "carry it forward", and all the
  *three* moves should be mutually exclusive (to be safe!).

The second tricky complication is the "tree forwarding" that can happen
when copyprop is involved.
It is conceptually similar to the "indirect tree move".
Only, the 2nd tree is not really a tree, it is just the local defined
in the 1st tree move.
It can happen that copyprop will propagate the definition.
We cannot make treeprop do the same job of copyprop, because copyprop
has less constraints, and is therefore more powerful in its scope.
The main issue is that treeprop cannot propagate a tree to *two* uses,
while copyprop is perfectly capable of propagating one definition to
two (or more) different places.
So we must let copyprop do its job otherwise we'll miss optimizations,
but we must also make it play safe with treeprop.
Let's clarify with an example:
  a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
  b = a; //a is used, state [U] with move scheduled, and
         //b is defined by a, ACP[b] is a, and b is in state [DC]
  c = b + v3; // b is used, goes to state [U]
The real trouble is that copyprop happens *immediately*, while treeprop
is deferred to the end of the CFG traversal.
So, in the 3rd statement, the "b" is immediately turned into an "a" by
copyprop, regardless of what treeprop will do.
Anyway, if we are careful, this is not so bad.
First of all, we must "accept" the fact that in the 3rd statement the
"b" is in fact an "a", as treeprop must happen *after* copyprop.
The real problem is that "a" is used twice: in the 2nd and 3rd lines.
In our usual setup, the 2nd line would set it to [U], and the 3rd line
would kill the move (and set "a" to [E]).
I have tried to play tricks, and reason as of copyprop didn't happen,
but everything becomes really messy.
Instead, we should note that the 2nd line is very likely to be dead.
At least in this BB, copyprop will turn all "b"s into "a"s as long as
it can, and when it cannot, it will be because either "a" or "b" have
been redefined, which would be after the tree move anyway.
So, the reasoning gets different: let's pretend that "b" will be dead.
This will make the "a" use in the 2nd statement useless, so there we
can "reset" "a" to [D], but also take note that if "b" will end up
not being dead, the tree move associated to this [D] must be aborted.
We can detect this in the following way:
- Either "b" is used before being defined in this BB, or
- It will be flagged "unsafe".
Both things are very easy to check.
The only quirk is that the "affecting" lists must not be cleared when
a slot goes to state [U], because a "propagation" could put it back
to state [D] (where those lists are needed, because it can be killed
by a definition to a used slot).

-----------------------------------------------------------------------

Implementation notes

All the implementation runs inside the existing mono_local_cprop
function, and a separate memory pool is used to hold the temporary
data.

A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
the list of scheduled moves and auxiliary things.
This struct is allocated if the tree move pass is requested, and is
then passed along to all the involved functions, which are therefore
aware of the tree mover state.

The ACT is an array of slots, obviously one per local.
Each slot is of type MonoTreeMoverActSlot, and contains the used and
affected locals, a pointer to the pending tree move and the "waiting"
and "unsafe" flags.

The "affecting" lists a built from "dependency nodes", of type
MonoTreeMoverDependencyNode.
Each of the nodes contains the used and affected local, and is in
two lists: the locals used by a slot, and the locals affected by a
slot (obviously a different one).
So, each node means: "variable x is used in tree t, so a definition
of x affects tree t".
The "affecting" lists are doubly linked, to allow for O(1) deletion.
The "used" lists are simply linked, but when they are mantained there
is always a pointer to the last element to allow for O(1) list moving.
When a used list is dismissed (which happens often, any time a node is
killed), its nodes are unlinked from their respective affecting lists
and are then put in a "free" list in the MonoTreeMover to be reused.

Each tree move is represented by a struct (MonoTreeMoverTreeMove),
which contains:
- the definition and use points,
- the "affected" moves (recall the concept of "indirect tree move"),
- the "must be dead" slots (recall "tree forwarding"). and
- a few utility flags.
The tree moves stays in the relevant ACT slot until it is ready to be
scheduled for moving, at which point it is put in a list in the
MonoTreeMover.
The tree moves structs are reused when they are killed, so there is
also a "free" list for them in the MonoTreeMover.

The tree mover code has been added to all the relevant functions that
participate in consprop and copyprop, particularly:
- mono_cprop_copy_values takes care of variable uses (transitions from
  states [D] to [U] and [U] to [E] because of killing),
- mono_cprop_invalidate_values takes care of side effects (indirect
  accesses, calls...),
- mono_local_cprop_bb sets up and cleans the traversals for each BB,
  and for each MonoInst it takes care of variable definitions.
To each of them has been added a MonoTreeMover parameter, which is not
NULL if the tree mover is running.
After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
the list of all the pending moves, which must be walked to actually
perform the moves (when possible, because "unsafe" flags, "affected"
moves and "must be dead" slots can still have their effects, which
must be handled now because they are fully known only at the end of
the CFG traversal).