SSAPRE stands for "SSA based Partial Redundancy Elimination". The algorithm is explained in this paper: Partial Redundancy Elimination in SSA Form (1999) Robert Kennedy, Sun Chan, SHIN-MING LIU, RAYMOND LO, PENG TU, FRED CHOW ACM Transactions on Programming Languages and Systems http://citeseer.ist.psu.edu/kennedy99partial.html In this document I give a gentle introduction to the concept of "partial" redundancies, and I explain the basic design decisions I took in implementing SSAPRE, but the paper is essential to understand the code. Partial Redundancy Elimination (or PRE) is an optimization that (guess what?) tries to remove redundant computations. It achieves this by saving the result of "not redundant" evaluations of expressions into appositely created temporary variables, so that "redundant" evaluations can be replaced by a load from the appropriate variable. Of course, on register starved architectures (x86) a temporary could cost more than the evaluation itself... PRE guarantees that the live range of the introduced variables is the minimal possible, but the added pressure on the register allocator can be an issue. The nice thing about PRE is that it not only removes "full" redundancies, but also "partial" ones. A full redundancy is easy to spot, and straightforward to handle, like in the following example (in every example here, the "expression" is "a + b"): int FullRedundancy1 (int a, int b) { int v1 = a + b; int v2 = a + b; return v1 + v2; } PRE would transform it like this: int FullRedundancy1 (int a, int b) { int t = a + b; int v1 = t; int v2 = t; return v1 + v2; } Of course, either a copy propagation pass or a register allocator smart enough to remove unneeded variables would be necessary afterwords. Another example of full redundancy is the following: int FullRedundancy2 (int a, int b) { int v1; if (a >= 0) { v1 = a + b; // BB1 } else { a = -a; // BB2 v1 = a + b; } int v2 = a + b; // BB3 return v1 + v2; } Here the two expressions in BB1 and BB2 are *not* the same thing (a is modified in BB2), but both are redundant with the expression in BB3, so the code can be transformed like this: int FullRedundancy2 (int a, int b) { int v1; int t; if (a >= 0) { t = a + b; // BB1 v1 = t; } else { a = -a; // BB2 t = a + b; v1 = t; } int v2 = t; // BB3 return v1 + v2; } Note that there are still two occurrences of the expression, while it can be easily seen that one (at the beginning of BB3) would suffice. This, however, is not a redundancy for PRE, because there is no path in the CFG where the expression is evaluated twice. Maybe this other kind of redundancy (which affects code size, and not the computations that are actually performed) would be eliminated by code hoisting, but I should check it; anyway, it is not a PRE related thing. An example of partial redundancy, on the other hand, is the following: int PartialRedundancy (int a, int b) { int v1; if (a >= 0) { v1 = a + b; // BB1 } else { v1 = 0; // BB2 } int v2 = a + b; // BB3 return v1 + v2; } The redundancy is partial because the expression is computed more than once along some path in the CFG, not all paths. In fact, on the path BB1 - BB3 the expression is computed twice, but on the path BB2 - BB3 it is computed only once. In this case, PRE must insert new occurrences of the expression in order to obtain a full redundancy, and then use temporary variables as before. Adding a computation in BB2 would do the job. One nice thing about PRE is that loop invariants can be seen as partial redundancies. The idea is that you can get into the loop from two locations: from before the loop (at the 1st iteration), and from inside the loop itself (at any other iteration). If there is a computation inside the loop that is in fact a loop invariant, PRE will spot this, and will handle the BB before the loop as a place where to insert a new computation to get a full redundancy. At this point, the computation inside the loop would be replaced by an use of the temporary stored before the loop, effectively performing "loop invariant code motion". Now, this is what PRE does to the code. But how does it work? In "classic" solutions, PRE is formulated as a data flow analysis problem. The Muchnick provides a detailed description of the algorithm in this way (it is by far the most complex problem of this kind in the whole book). The point is that this algorithm does not exploit the SSA form. In fact, it has to perform all that amount of data flow analysis exactly because it does not take advantage of the work already done if you have put the program into SSA form. The SSAPRE algorithm, on the other hand, is designed with SSA in mind. It fully exploits the properties of SSA variables, it also explicitly reuses some data structures that must have been computed when building the SSA form, and takes great care to output its code in that form already (which means that the temporaries it introduces are already "versioned", with all the phi variables correctly placed). The main concept used in this algorithm is the "Factored Redundancy Graph" (or FRG in short). Basically, for each given expression, this graph shows all its occurrences in the program, and redundant ones are linked to their "representative occurrence" (the 1st occurrence met in a CFG traversal). The central observation is that the FRG is "factored" because each expression occurrence has exactly one representative occurrence, in the same way as the SSA form is a "factored" use-definition graph (each use is related to exactly one definition). And in fact building the FRG is much like building an SSA representation, with PHI nodes and all. By the way, I use "PHI" for "PHI expression occurrences", and "phi" for the usual phi definitions in SSA, because the paper uses an uppercase phi greek letter for PHI occurrences (while the lowercase one is standard in SSA). The really interesting point is that the FRG for a given expression has exactly the same "shape" of the use-definition graph for the temporary var that must be introduced to remove the redundancy, and this is why SSAPRE can easily emit its output code in correct SSA form. One particular characteristic of the SSAPRE algorithm is that it is "sparse", in the sense that it operates on expressions individually, looking only at the specific nodes it needs. This is in contrast to the classical way of solving data flow analysis problems, which is to operate globally, using data structures that work "in parallel" on all the entities they operate on (in practice bit sets, with for instance one bit per variable, or one bit per expression occurrence, you get the idea). This is handy (it exploits the parallelism of bit operations), but in general setting up those data structures is time consuming, and if the number of the things represented by the bits is not fixed in advance the code can get quite messy (bit sets must become growable). Moreover, applying special handling to individual expressions becomes a very tricky thing. SSAPRE, on the other hand, looks at the whole program (method in our case) only once, when it scans the code to collect (and classify) all expression occurrences. From here on, it operates on one expression at a time, looking only at its specific data structures (which, for each expression, are much smaller than the whole program, so the operations are fast). This approach has another advantage: the data structures used to operate on one expressions can be recycled when operating on other expressions, making the memory usage of the compiler lower, and (also important) avoiding losing time with memory allocations at all. This reflects directly on the design of those data structures. We can better see these advantages following which data structures are used during the application of SSAPRE to a method. The steps that are performed are the following: * Initialization: scan the whole program, collect all the occurrences, and build a worklist of expressions to be processed (each worklist entry describes all the occurrences of the given expression). Here the data structures are the following: - One struct (the working area), containing the worklist and other "global" data. The worklist itself contains an entry for each expression which in turn has an entry for each occurrence. - One "info" struct for each BB, containing interesting dominance and CFG related properties of the BB. Then, for each entry in the worklist, these operations are performed: * PHI placement: find where the PHI nodes of the FRG must be placed. * Renaming: assign appropriate "redundancy classes" to all occurrences (it is like assigning variable versions when building an SSA form). * Analyze: compute various flags in PHI nodes (which are the only places that define where additional computations may be added). This conceptually is composed of two data flow analysis passes, which in practice only scan the PHI nodes in the FRG, not the whole code, so they are not that heavy. * Finalize: make so that the FRG is exactly like the use-def graph for the temporary that will be introduced (it generally must be reshaped a bit according to the flags computed in the previous step). This is also made of two steps, but more for implementation reasons than for conceptual ones. * Code motion: actually update the code using the FRG. Here, what's needed is the following: - PHI occurrences (and some flags sbout them) - PHI argument occurrences (and some flags sbout them) - The renaming stack In practice, one can observe that each BB can have at most one PHI (we work on one expression at a time), and also one PHI argument (which we consider occurring at the end of the BB). Therefore, we do not have separate structures for these, but store them directly in the BB infos (which are kept for the whole SSAPRE invocation). The renaming stack is managed directly as a list of occurrences, with special handling for PHI nodes (which, being represented directly by their BB, are not "occurrences"). So far, the only two missing things (with respect to SSAPRE in the paper) are unneeded PHIs eliminantion and the handling of "composite" expressions. Otherwise, the implementation is complete. Other interesting issues are: - SSAPRE has the assumption that: - each SSA variable is related to one "original" (not SSA) variable, and - no more than one version of each original variable is live at the same time in the CFG. It would be better to relax these assumptions. - SSAPRE operates on "syntactic" redundancies, not on "values". GVNPRE (or other means of merging GVN) would be a nice alternative, see "http://www.cs.purdue.edu/homes/vandrutj/papers/thesis.pdf".