374 lines
13 KiB
Plaintext
374 lines
13 KiB
Plaintext
|
|
||
|
Arrays Bounds Check Elimination (ABC)
|
||
|
in the Mono Runtime
|
||
|
|
||
|
Massimiliano Mantione (mass@ximian.com)
|
||
|
|
||
|
Here "abc" stays for "array bounds check", or "array bound checks", or
|
||
|
some combination of the two.
|
||
|
|
||
|
* Usage
|
||
|
|
||
|
Simply use the "abcrem" optimization invoking mono.
|
||
|
|
||
|
To see if bound checks are actually removed, use "mono -v" and
|
||
|
grep for "ARRAY-ACCESS" in the output, there should be a
|
||
|
message for each check that has been removed.
|
||
|
|
||
|
To trace the algorithm execution, use "-v -v -v", and be
|
||
|
prepared to be totally submersed by debugging messages...
|
||
|
|
||
|
* Effectiveness
|
||
|
|
||
|
The abc removal code can now always remove bound checks from
|
||
|
"clean" array scans in loops, and generally anyway there are
|
||
|
clear conditions that already state that the index is "safe".
|
||
|
|
||
|
To be clearer, and give an idea of what the algorithm can and
|
||
|
cannot do without describing it in detail... keep in mind that
|
||
|
only "redunant" checks cannot be removed. By "redundant", I
|
||
|
mean "already explicitly checked" in the method code.
|
||
|
|
||
|
Unfortunately, analyzing complex expressions is not so easy
|
||
|
(see below for details), so the capabilities of this "abc
|
||
|
remover" are limited.
|
||
|
|
||
|
These are simple guidelines:
|
||
|
|
||
|
- Only expressions of the following kinds are handled:
|
||
|
- constant
|
||
|
- variable [+/- constant]
|
||
|
- Only comparisons between "handled" expressions are understood.
|
||
|
- "switch" statements are not yet handled.
|
||
|
|
||
|
This means that code like this will be handled well:
|
||
|
|
||
|
for (int i = 0; i < a.Length; i++) {
|
||
|
a [i] = .....
|
||
|
}
|
||
|
|
||
|
The "i" variable could be declared out of the "for", the "for"
|
||
|
could be a "while", and maybe even implemented with "goto",
|
||
|
the array could be scanned in reverse order, and everything
|
||
|
would still work.
|
||
|
|
||
|
I could have a temporary variable storing the array length,
|
||
|
and check on it inside the loop, and the abc removal would
|
||
|
still occurr, like this:
|
||
|
|
||
|
int i = 0;
|
||
|
int l = a.Length;
|
||
|
while ( i < l ) {
|
||
|
a [i] ......
|
||
|
}
|
||
|
|
||
|
or this:
|
||
|
|
||
|
int l = a.Length;
|
||
|
for (int i = l; i > 0; i--) {
|
||
|
a [i] = .....
|
||
|
}
|
||
|
|
||
|
The following two examples would work:
|
||
|
|
||
|
for (int i = 0; i < (a.Length -1); i++) .....
|
||
|
for (int i = 0; i < a.Length; i += 2) .....
|
||
|
|
||
|
But just something like this
|
||
|
|
||
|
int delta = -1;
|
||
|
for (int i = 0; i < (a.Length + delta); i++) .....
|
||
|
|
||
|
or like this
|
||
|
|
||
|
int delta = +2;
|
||
|
for (int i = 0; i < a.Length; i += delta) .....
|
||
|
|
||
|
would not work, the check would stay there. (unless
|
||
|
a combination of cfold, consprop and copyprop is used, too,
|
||
|
which would make the constant value of "delta" explicit).
|
||
|
|
||
|
Just to make you understand how things are tricky... this would work!
|
||
|
|
||
|
int limit = a.Length - 1;
|
||
|
for (int i = 0; i < limit; i++) {
|
||
|
a [i] = .....
|
||
|
}
|
||
|
|
||
|
A detailed explanation of the reason why things are done like
|
||
|
this is given below.
|
||
|
|
||
|
* The Algorithm
|
||
|
|
||
|
This array bound check removal (abc removal) algorithm is
|
||
|
based on symbolic execution concepts. I handle the removal
|
||
|
like an "unreachable code elimination" (and in fact the
|
||
|
optimization could be extended to remove also other
|
||
|
unreachable sections of code, due to branches that "always go
|
||
|
the same way").
|
||
|
|
||
|
In symbolic execution, variables do not have actual (numeric)
|
||
|
values, but instead symbolic expressions (like "a", or "x+y").
|
||
|
Also, branch conditions are handled like symbolic conditions
|
||
|
(like "i<k"), which state relations between variable values.
|
||
|
|
||
|
The SSA representation inside mini is somewhat close to a
|
||
|
symbolic representation of the execution of the compiled
|
||
|
method.
|
||
|
|
||
|
Particularly, the representation of variable values is exactly
|
||
|
a symbolic one. It is enough to find all CEE_STIND_*
|
||
|
instructions which store to a local variable, and their second
|
||
|
argument is exactly the variable value. Actually, "cfg->vars
|
||
|
[<variable-index>]->def" should contain exactly those store
|
||
|
instructions, and the "get_variable_value_from_ssa_store"
|
||
|
function extracts the variable value from there.
|
||
|
|
||
|
On the other hand, the conditions under which each basic block
|
||
|
is executed are not made fully explicit.
|
||
|
|
||
|
However, it is not difficult to make them so.
|
||
|
|
||
|
Each BB that has more than one exit BB, in practice must end
|
||
|
either with a conditional branch instruction or with a switch
|
||
|
instruction.
|
||
|
|
||
|
In the first case, the BB has exactly two exit BBs, and their
|
||
|
execution conditions are easy to get from the condition of the
|
||
|
branch (see the "get_relation_from_branch_instruction"
|
||
|
function, and expecially the end of "analyze_block" in
|
||
|
abcremoval.c.
|
||
|
|
||
|
If there is a switch, the jump condition of every exit BB is
|
||
|
the equality of the switch argument with the particular index
|
||
|
associated with its case (but the current implementation does
|
||
|
not handle switch statements yet).
|
||
|
|
||
|
These individual conditions are in practice associated to each
|
||
|
arc that connects BBs in the CFG (with the simple case that
|
||
|
unconditional branches have a "TRUE" condition, because they
|
||
|
always happen).
|
||
|
|
||
|
So, for each BB, its *proper* entry condition is the union of
|
||
|
all the conditions associated to arcs that enter the BB. The
|
||
|
"union" is like a logical "or", in the sense that either of
|
||
|
the condition could be true, they are not necessarily all
|
||
|
true. This means that if I can enter a BB in two ways, and in
|
||
|
one case I know that "x>0", and in the other that "x==0",
|
||
|
actually in the BB I know that "x>=0", which is a weaker
|
||
|
condition (the union of the two).
|
||
|
|
||
|
Also, the *complete* entry condition for a BB is the
|
||
|
"intersection" of all the entry conditions of its
|
||
|
dominators. This is true because each dominator is the only
|
||
|
way to reach the BB, so the entry condition of each dominator
|
||
|
must be true if the control flow reached the BB. This
|
||
|
translates to the logical "and" of all the "proper" conditions
|
||
|
of the BBs met walking up in the dominator tree. So, if one
|
||
|
says "x>0", and another "x==1", then I know that "x==1", which
|
||
|
is a stronger condition (the intersection of the two).
|
||
|
|
||
|
Note that, if the two conditions were "x>0" and "x==0", then
|
||
|
the block would be unreachable (the intersection is empty),
|
||
|
because some branch is impossible.
|
||
|
|
||
|
Another observation is that, inside each BB, every variable is
|
||
|
subject to the complete entry condition of that very same BB,
|
||
|
and not the one in which it is defined (with the "complete
|
||
|
entry condition" being the thing I defined before, sorry if
|
||
|
these terms "proper" and "complete" are strange, I found
|
||
|
nothing better).
|
||
|
|
||
|
This happens because the branch conditions are related to the
|
||
|
control flow. I can define "i=a", and if I am in a BB where
|
||
|
"a>0", then "i>0", but not otherwise.
|
||
|
|
||
|
So, intuitively, if the available conditions say "i>=0", and i
|
||
|
is used as an index in an array access, then the lower bound
|
||
|
check can be omitted. If the condition also says
|
||
|
"(i>=0)&&(i<array.length)", the abc removal can occur.
|
||
|
|
||
|
So, a complete solution to the problem of abc removal would be
|
||
|
the following: for each array access, build a system of
|
||
|
equations containing:
|
||
|
|
||
|
[1] all the symbolic variable definitions
|
||
|
|
||
|
[2] the complete entry condition of the BB in which
|
||
|
the array access occurs
|
||
|
|
||
|
[3] the two "goal functions" ("index >=0" and "index <
|
||
|
array.length")
|
||
|
|
||
|
If the system is valid for *each possible* variable value, then the goal
|
||
|
functions are always true, and the abc can be removed.
|
||
|
|
||
|
All this discussion is useful to give a precise specification
|
||
|
to the problem we are trying to solve.
|
||
|
|
||
|
The trouble is that, in the general case, the resulting system
|
||
|
of equations is like a predicate in first order logic, which
|
||
|
is semi-decidable, and its general solution is anyway too
|
||
|
complex to be attempted in a JIT compiler (which should not
|
||
|
contain a full fledged theorem prover).
|
||
|
|
||
|
Therefore, we must cut some corner.
|
||
|
|
||
|
There is also another big problem, which is caused by
|
||
|
"recursive" symbolic definitions. These definition can (and
|
||
|
generally do) happen every time there is a loop. For instance,
|
||
|
in the following piece of code:
|
||
|
|
||
|
for ( int i = 0; i < array.length; i++ ) {
|
||
|
Console.WriteLine( "array [i] = " + array [i] );
|
||
|
}
|
||
|
|
||
|
one of the definitions of i is a PHI that can be either 0 or
|
||
|
"i + 1".
|
||
|
|
||
|
Now, we know that mathematically "i = i + 1" does not make
|
||
|
sense, and in fact symbolic values are not "equations", they
|
||
|
are "symbolic definitions".
|
||
|
|
||
|
The actual symbolic value of i is a generic "n", where "n" is
|
||
|
the number of iterations of the loop, but this is terrible to
|
||
|
handle (and in more complex examples the symbolic value of i
|
||
|
simply cannot be written, because i is calculated in an
|
||
|
iterative way).
|
||
|
|
||
|
However, the definition "i = i + 1" tells us something about
|
||
|
i: it tells us that i "grows". So (from the PHI definition) we
|
||
|
know that i is either 0, or "grows". This is enough to tell
|
||
|
that "i>=0", which is what we want! It is important to note
|
||
|
that recursive definitions can only occurr inside PHI
|
||
|
definitions, because actually a variable cannot be defined
|
||
|
*only* in terms of itself!
|
||
|
|
||
|
At this point, I can explain which corners I want to cut to
|
||
|
make the problem solvable. It will not remove all the abc that
|
||
|
could theoretically be removed, but at least it will work.
|
||
|
|
||
|
The easiest way to cut corners is to only handle expressions
|
||
|
which are "reasonably simple", and ignore the rest.
|
||
|
|
||
|
Keep in mind that ignoring an expression is not harmful in
|
||
|
itself. The algorithm will be simply "less powerful", because
|
||
|
it will ignore conditions that could have caused to the
|
||
|
removal of an abc, but will not remove checks "by mistake" (so
|
||
|
the resulting code will be in any case correct).
|
||
|
|
||
|
The expressions we handle are the following (all of integer
|
||
|
type):
|
||
|
|
||
|
- constant
|
||
|
- variable
|
||
|
- variable + constant
|
||
|
- constant + variable
|
||
|
- variable - constant
|
||
|
|
||
|
And, of course, PHI definitions.
|
||
|
|
||
|
Any other expression causes the introduction of an "any" value
|
||
|
in the evaluation, which makes all values that depend from it
|
||
|
unknown as well.
|
||
|
|
||
|
We will call these kind of definitions "summarizable"
|
||
|
definitions.
|
||
|
|
||
|
In a first attempt, we can consider only branch conditions
|
||
|
that have the simplest possible form (the comparison of two
|
||
|
summarizable expressions).
|
||
|
|
||
|
We can also simplify the effect of variable definitions,
|
||
|
keeping only what is relevant to know: their value range with
|
||
|
respect to zero and with respect to the length of the array we
|
||
|
are currently handling.
|
||
|
|
||
|
One particular note on PHI functions: they work (obviously)
|
||
|
like the logical "or" of their definitions, and therefore are
|
||
|
equivalent to the "logical or" of the summarization of their
|
||
|
definitions.
|
||
|
|
||
|
About recursive definitions (which, believe me, are the worst
|
||
|
thing in all this mess), we handle only "monotonic" ones. That
|
||
|
is, we try to understand if the recursive definition (which,
|
||
|
as we said above, must happen because of a loop) always
|
||
|
"grows" or "gets smaller". In all other cases, we decide we
|
||
|
cannot handle it.
|
||
|
|
||
|
One critical thing, once we have defined all these data
|
||
|
structures, is how the evaluation is actually performed.
|
||
|
|
||
|
In a first attempt I coded a "brute force" approach, which for
|
||
|
each BB tried to examine all possible conditions between all
|
||
|
variables, filling a sort of "evaluation matrix". The problem
|
||
|
was that the complexity of this evaluation was quadratic (or
|
||
|
worse) on the number of variables, and that many wariables
|
||
|
were examined even if they were not involved in any array
|
||
|
access.
|
||
|
|
||
|
Following the ABCD paper:
|
||
|
|
||
|
http://citeseer.ist.psu.edu/bodik00abcd.html
|
||
|
|
||
|
I rewrote the algorithm in a more "sparse" way.
|
||
|
|
||
|
Now, the main data structure is a graph of relations between
|
||
|
variables, and each attempt to remove a check performs a
|
||
|
traversal of the graph, looking for a path from the index to
|
||
|
the array length that satisfies the properties "index >= 0"
|
||
|
and "index < length". If such a path is found, the check is
|
||
|
removed. It is true that in theory *each* traversal has a
|
||
|
complexity which is exponential on the number of variables,
|
||
|
but in practice the graph is not very connected, so the
|
||
|
traversal terminates quickly.
|
||
|
|
||
|
|
||
|
Then, the algorithm to optimize one method looks like this:
|
||
|
|
||
|
[1] Preparation:
|
||
|
|
||
|
[1a] Build the SSA representation.
|
||
|
|
||
|
[1b] Prepare the evaluation graph (empty)
|
||
|
|
||
|
[1b] Summarize each varible definition, and put
|
||
|
the resulting relations in the evaluation
|
||
|
graph
|
||
|
|
||
|
[2] Analyze each BB, starting from the entry point and
|
||
|
following the dominator tree:
|
||
|
|
||
|
[2a] Summarize its entry condition, and put the resulting relations
|
||
|
in the evaluation graph (this is the reason
|
||
|
why the BBs are examined following the
|
||
|
dominator tree, so that conditions are added
|
||
|
to the graph in a "cumulative" way)
|
||
|
|
||
|
[2b] Scan the BB instructions, and for each array
|
||
|
access perform step [3]
|
||
|
|
||
|
[2c] Process children BBs following the dominator
|
||
|
tree (step [2])
|
||
|
|
||
|
[2d] Remove from the evaluation area the conditions added at step [2a]
|
||
|
(so that backtracking along the tree the area
|
||
|
is properly cleared)
|
||
|
|
||
|
[3] Attempt the removal:
|
||
|
|
||
|
[3a] Summarize the index expression, to see if we can handle it; there
|
||
|
are three cases: the index is either a
|
||
|
constant, or a variable (with an optional
|
||
|
delta) or cannot be handled (is a "any")
|
||
|
|
||
|
[3b] If the index can be handled, traverse the evaluation area searching
|
||
|
a path from the index variable to the array
|
||
|
length (if the index is a constant, just
|
||
|
examine the array length to see if it has
|
||
|
some relation with this constant)
|
||
|
|
||
|
[3c] Use the results of step [3b] to decide if the check is redundant
|
||
|
|
||
|
|