a575963da9
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
250 lines
11 KiB
Plaintext
250 lines
11 KiB
Plaintext
SSAPRE stands for "SSA based Partial Redundancy Elimination".
|
|
|
|
The algorithm is explained in this paper:
|
|
|
|
Partial Redundancy Elimination in SSA Form (1999)
|
|
Robert Kennedy, Sun Chan, SHIN-MING LIU, RAYMOND LO, PENG TU, FRED CHOW
|
|
ACM Transactions on Programming Languages and Systems
|
|
|
|
http://citeseer.ist.psu.edu/kennedy99partial.html
|
|
|
|
In this document I give a gentle introduction to the concept of "partial"
|
|
redundancies, and I explain the basic design decisions I took in implementing
|
|
SSAPRE, but the paper is essential to understand the code.
|
|
|
|
Partial Redundancy Elimination (or PRE) is an optimization that (guess what?)
|
|
tries to remove redundant computations.
|
|
It achieves this by saving the result of "not redundant" evaluations of
|
|
expressions into appositely created temporary variables, so that "redundant"
|
|
evaluations can be replaced by a load from the appropriate variable.
|
|
|
|
Of course, on register starved architectures (x86) a temporary could cost more
|
|
than the evaluation itself... PRE guarantees that the live range of the
|
|
introduced variables is the minimal possible, but the added pressure on the
|
|
register allocator can be an issue.
|
|
|
|
The nice thing about PRE is that it not only removes "full" redundancies, but
|
|
also "partial" ones.
|
|
A full redundancy is easy to spot, and straightforward to handle, like in the
|
|
following example (in every example here, the "expression" is "a + b"):
|
|
|
|
int FullRedundancy1 (int a, int b) {
|
|
int v1 = a + b;
|
|
int v2 = a + b;
|
|
return v1 + v2;
|
|
}
|
|
|
|
PRE would transform it like this:
|
|
|
|
int FullRedundancy1 (int a, int b) {
|
|
int t = a + b;
|
|
int v1 = t;
|
|
int v2 = t;
|
|
return v1 + v2;
|
|
}
|
|
|
|
Of course, either a copy propagation pass or a register allocator smart enough
|
|
to remove unneeded variables would be necessary afterwords.
|
|
|
|
Another example of full redundancy is the following:
|
|
|
|
int FullRedundancy2 (int a, int b) {
|
|
int v1;
|
|
|
|
if (a >= 0) {
|
|
v1 = a + b; // BB1
|
|
} else {
|
|
a = -a; // BB2
|
|
v1 = a + b;
|
|
}
|
|
|
|
int v2 = a + b; // BB3
|
|
return v1 + v2;
|
|
}
|
|
|
|
Here the two expressions in BB1 and BB2 are *not* the same thing (a is
|
|
modified in BB2), but both are redundant with the expression in BB3, so the
|
|
code can be transformed like this:
|
|
|
|
int FullRedundancy2 (int a, int b) {
|
|
int v1;
|
|
int t;
|
|
|
|
if (a >= 0) {
|
|
t = a + b; // BB1
|
|
v1 = t;
|
|
} else {
|
|
a = -a; // BB2
|
|
t = a + b;
|
|
v1 = t;
|
|
}
|
|
|
|
int v2 = t; // BB3
|
|
return v1 + v2;
|
|
}
|
|
|
|
Note that there are still two occurrences of the expression, while it can be
|
|
easily seen that one (at the beginning of BB3) would suffice.
|
|
This, however, is not a redundancy for PRE, because there is no path in the
|
|
CFG where the expression is evaluated twice.
|
|
Maybe this other kind of redundancy (which affects code size, and not the
|
|
computations that are actually performed) would be eliminated by code hoisting,
|
|
but I should check it; anyway, it is not a PRE related thing.
|
|
|
|
An example of partial redundancy, on the other hand, is the following:
|
|
|
|
int PartialRedundancy (int a, int b) {
|
|
int v1;
|
|
|
|
if (a >= 0) {
|
|
v1 = a + b; // BB1
|
|
} else {
|
|
v1 = 0; // BB2
|
|
}
|
|
|
|
int v2 = a + b; // BB3
|
|
return v1 + v2;
|
|
}
|
|
|
|
The redundancy is partial because the expression is computed more than once
|
|
along some path in the CFG, not all paths.
|
|
In fact, on the path BB1 - BB3 the expression is computed twice, but on the
|
|
path BB2 - BB3 it is computed only once.
|
|
In this case, PRE must insert new occurrences of the expression in order to
|
|
obtain a full redundancy, and then use temporary variables as before.
|
|
Adding a computation in BB2 would do the job.
|
|
|
|
One nice thing about PRE is that loop invariants can be seen as partial
|
|
redundancies.
|
|
The idea is that you can get into the loop from two locations: from before the
|
|
loop (at the 1st iteration), and from inside the loop itself (at any other
|
|
iteration). If there is a computation inside the loop that is in fact a loop
|
|
invariant, PRE will spot this, and will handle the BB before the loop as a
|
|
place where to insert a new computation to get a full redundancy.
|
|
At this point, the computation inside the loop would be replaced by an use of
|
|
the temporary stored before the loop, effectively performing "loop invariant
|
|
code motion".
|
|
|
|
Now, this is what PRE does to the code.
|
|
|
|
But how does it work?
|
|
|
|
In "classic" solutions, PRE is formulated as a data flow analysis problem.
|
|
The Muchnick provides a detailed description of the algorithm in this way (it
|
|
is by far the most complex problem of this kind in the whole book).
|
|
The point is that this algorithm does not exploit the SSA form.
|
|
In fact, it has to perform all that amount of data flow analysis exactly
|
|
because it does not take advantage of the work already done if you have put
|
|
the program into SSA form.
|
|
|
|
The SSAPRE algorithm, on the other hand, is designed with SSA in mind.
|
|
It fully exploits the properties of SSA variables, it also explicitly reuses
|
|
some data structures that must have been computed when building the SSA form,
|
|
and takes great care to output its code in that form already (which means that
|
|
the temporaries it introduces are already "versioned", with all the phi
|
|
variables correctly placed).
|
|
|
|
The main concept used in this algorithm is the "Factored Redundancy Graph" (or
|
|
FRG in short). Basically, for each given expression, this graph shows all its
|
|
occurrences in the program, and redundant ones are linked to their
|
|
"representative occurrence" (the 1st occurrence met in a CFG traversal).
|
|
The central observation is that the FRG is "factored" because each expression
|
|
occurrence has exactly one representative occurrence, in the same way as the
|
|
SSA form is a "factored" use-definition graph (each use is related to exactly
|
|
one definition).
|
|
And in fact building the FRG is much like building an SSA representation, with
|
|
PHI nodes and all.
|
|
By the way, I use "PHI" for "PHI expression occurrences", and "phi" for the
|
|
usual phi definitions in SSA, because the paper uses an uppercase phi greek
|
|
letter for PHI occurrences (while the lowercase one is standard in SSA).
|
|
|
|
The really interesting point is that the FRG for a given expression has exactly
|
|
the same "shape" of the use-definition graph for the temporary var that must
|
|
be introduced to remove the redundancy, and this is why SSAPRE can easily emit
|
|
its output code in correct SSA form.
|
|
|
|
One particular characteristic of the SSAPRE algorithm is that it is "sparse",
|
|
in the sense that it operates on expressions individually, looking only at the
|
|
specific nodes it needs. This is in contrast to the classical way of solving
|
|
data flow analysis problems, which is to operate globally, using data
|
|
structures that work "in parallel" on all the entities they operate on (in
|
|
practice bit sets, with for instance one bit per variable, or one bit per
|
|
expression occurrence, you get the idea). This is handy (it exploits the
|
|
parallelism of bit operations), but in general setting up those data
|
|
structures is time consuming, and if the number of the things represented by
|
|
the bits is not fixed in advance the code can get quite messy (bit sets must
|
|
become growable).
|
|
Moreover, applying special handling to individual expressions becomes a very
|
|
tricky thing.
|
|
|
|
SSAPRE, on the other hand, looks at the whole program (method in our case) only
|
|
once, when it scans the code to collect (and classify) all expression
|
|
occurrences.
|
|
From here on, it operates on one expression at a time, looking only at its
|
|
specific data structures (which, for each expression, are much smaller than
|
|
the whole program, so the operations are fast).
|
|
|
|
This approach has another advantage: the data structures used to operate on
|
|
one expressions can be recycled when operating on other expressions, making
|
|
the memory usage of the compiler lower, and (also important) avoiding losing
|
|
time with memory allocations at all.
|
|
This reflects directly on the design of those data structures.
|
|
We can better see these advantages following which data structures are used
|
|
during the application of SSAPRE to a method.
|
|
|
|
The steps that are performed are the following:
|
|
|
|
* Initialization: scan the whole program, collect all the occurrences, and
|
|
build a worklist of expressions to be processed (each worklist entry
|
|
describes all the occurrences of the given expression).
|
|
Here the data structures are the following:
|
|
- One struct (the working area), containing the worklist and other
|
|
"global" data. The worklist itself contains an entry for each expression
|
|
which in turn has an entry for each occurrence.
|
|
- One "info" struct for each BB, containing interesting dominance and CFG
|
|
related properties of the BB.
|
|
|
|
Then, for each entry in the worklist, these operations are performed:
|
|
|
|
* PHI placement: find where the PHI nodes of the FRG must be placed.
|
|
* Renaming: assign appropriate "redundancy classes" to all occurrences (it
|
|
is like assigning variable versions when building an SSA form).
|
|
* Analyze: compute various flags in PHI nodes (which are the only places
|
|
that define where additional computations may be added).
|
|
This conceptually is composed of two data flow analysis passes, which in
|
|
practice only scan the PHI nodes in the FRG, not the whole code, so they
|
|
are not that heavy.
|
|
* Finalize: make so that the FRG is exactly like the use-def graph for the
|
|
temporary that will be introduced (it generally must be reshaped a bit
|
|
according to the flags computed in the previous step).
|
|
This is also made of two steps, but more for implementation reasons than
|
|
for conceptual ones.
|
|
* Code motion: actually update the code using the FRG.
|
|
Here, what's needed is the following:
|
|
- PHI occurrences (and some flags sbout them)
|
|
- PHI argument occurrences (and some flags sbout them)
|
|
- The renaming stack
|
|
In practice, one can observe that each BB can have at most one PHI (we work
|
|
on one expression at a time), and also one PHI argument (which we consider
|
|
occurring at the end of the BB). Therefore, we do not have separate structures
|
|
for these, but store them directly in the BB infos (which are kept for the
|
|
whole SSAPRE invocation).
|
|
The renaming stack is managed directly as a list of occurrences, with
|
|
special handling for PHI nodes (which, being represented directly by their
|
|
BB, are not "occurrences").
|
|
|
|
|
|
So far, the only two missing things (with respect to SSAPRE in the paper) are
|
|
unneeded PHIs eliminantion and the handling of "composite" expressions.
|
|
Otherwise, the implementation is complete.
|
|
|
|
Other interesting issues are:
|
|
- SSAPRE has the assumption that:
|
|
- each SSA variable is related to one "original" (not SSA) variable, and
|
|
- no more than one version of each original variable is live at the same time
|
|
in the CFG.
|
|
It would be better to relax these assumptions.
|
|
- SSAPRE operates on "syntactic" redundancies, not on "values".
|
|
GVNPRE (or other means of merging GVN) would be a nice alternative, see
|
|
"http://www.cs.purdue.edu/homes/vandrutj/papers/thesis.pdf".
|