linux-packaging-mono/docs/abc-removal.txt


		Arrays Bounds Check Elimination (ABC)
			 in the Mono Runtime

	       Massimiliano Mantione (mass@ximian.com)

Here "abc" stays for "array bounds check", or "array bound checks", or
some combination of the two.

* Usage

	Simply use the "abcrem" optimization invoking mono.

	To see if bound checks are actually removed, use "mono -v" and
	grep for "ARRAY-ACCESS" in the output, there should be a
	message for each check that has been removed.

	To trace the algorithm execution, use "-v -v -v", and be
	prepared to be totally submersed by debugging messages...

* Effectiveness

	The abc removal code can now always remove bound checks from
	"clean" array scans in loops, and generally anyway there are
	clear conditions that already state that the index is "safe".

	To be clearer, and give an idea of what the algorithm can and
	cannot do without describing it in detail... keep in mind that
	only "redunant" checks cannot be removed. By "redundant", I
	mean "already explicitly checked" in the method code.

	Unfortunately, analyzing complex expressions is not so easy
	(see below for details), so the capabilities of this "abc
	remover" are limited.

	These are simple guidelines:

	- Only expressions of the following kinds are handled:
	  - constant
	  - variable [+/- constant]
	- Only comparisons between "handled" expressions are understood.
	- "switch" statements are not yet handled.

	This means that code like this will be handled well:

		for (int i = 0; i < a.Length; i++) {
			a [i] = .....
		}

	The "i" variable could be declared out of the "for", the "for"
	could be a "while", and maybe even implemented with "goto",
	the array could be scanned in reverse order, and everything
	would still work.

	I could have a temporary variable storing the array length,
	and check on it inside the loop, and the abc removal would
	still occurr, like this:

		int i = 0;
		int l = a.Length;
		while ( i < l ) {
			a [i] ......
		}

	or this:

		int l = a.Length;
		for (int i = l; i > 0; i--) {
			a [i] = .....
		}

	The following two examples would work:

		for (int i = 0; i < (a.Length -1); i++) .....
		for (int i = 0; i < a.Length; i += 2) .....

	But just something like this

		int delta = -1;
		for (int i = 0; i < (a.Length + delta); i++) .....

	or like this

		int delta = +2;
		for (int i = 0; i < a.Length; i += delta) .....

	would not work, the check would stay there. (unless
	a combination of cfold, consprop and copyprop is used, too,
	which would make the constant value of "delta" explicit).

	Just to make you understand how things are tricky... this would work!

		int limit = a.Length - 1;
		for (int i = 0; i < limit; i++) {
			a [i] = .....
		}

	A detailed explanation of the reason why things are done like
	this is given below.

* The Algorithm

	This array bound check removal (abc removal) algorithm is
	based on symbolic execution concepts. I handle the removal
	like an "unreachable code elimination" (and in fact the
	optimization could be extended to remove also other
	unreachable sections of code, due to branches that "always go
	the same way").

	In symbolic execution, variables do not have actual (numeric)
	values, but instead symbolic expressions (like "a", or "x+y").
	Also, branch conditions are handled like symbolic conditions
	(like "i<k"), which state relations between variable values.

	The SSA representation inside mini is somewhat close to a
	symbolic representation of the execution of the compiled
	method.

	Particularly, the representation of variable values is exactly
	a symbolic one. It is enough to find all CEE_STIND_*
	instructions which store to a local variable, and their second
	argument is exactly the variable value.  Actually, "cfg->vars
	[<variable-index>]->def" should contain exactly those store
	instructions, and the "get_variable_value_from_ssa_store"
	function extracts the variable value from there.

	On the other hand, the conditions under which each basic block
	is executed are not made fully explicit.

	However, it is not difficult to make them so.

	Each BB that has more than one exit BB, in practice must end
	either with a conditional branch instruction or with a switch
	instruction.

	In the first case, the BB has exactly two exit BBs, and their
	execution conditions are easy to get from the condition of the
	branch (see the "get_relation_from_branch_instruction"
	function, and expecially the end of "analyze_block" in
	abcremoval.c.

	If there is a switch, the jump condition of every exit BB is
	the equality of the switch argument with the particular index
	associated with its case (but the current implementation does
	not handle switch statements yet).

	These individual conditions are in practice associated to each
	arc that connects BBs in the CFG (with the simple case that
	unconditional branches have a "TRUE" condition, because they
	always happen).

	So, for each BB, its *proper* entry condition is the union of
	all the conditions associated to arcs that enter the BB. The
	"union" is like a logical "or", in the sense that either of
	the condition could be true, they are not necessarily all
	true. This means that if I can enter a BB in two ways, and in
	one case I know that "x>0", and in the other that "x==0",
	actually in the BB I know that "x>=0", which is a weaker
	condition (the union of the two).

	Also, the *complete* entry condition for a BB is the
	"intersection" of all the entry conditions of its
	dominators. This is true because each dominator is the only
	way to reach the BB, so the entry condition of each dominator
	must be true if the control flow reached the BB. This
	translates to the logical "and" of all the "proper" conditions
	of the BBs met walking up in the dominator tree. So, if one
	says "x>0", and another "x==1", then I know that "x==1", which
	is a stronger condition (the intersection of the two).

	Note that, if the two conditions were "x>0" and "x==0", then
	the block would be unreachable (the intersection is empty),
	because some branch is impossible.

	Another observation is that, inside each BB, every variable is
	subject to the complete entry condition of that very same BB,
	and not the one in which it is defined (with the "complete
	entry condition" being the thing I defined before, sorry if
	these terms "proper" and "complete" are strange, I found
	nothing better).

	This happens because the branch conditions are related to the
	control flow.  I can define "i=a", and if I am in a BB where
	"a>0", then "i>0", but not otherwise.

	So, intuitively, if the available conditions say "i>=0", and i
	is used as an index in an array access, then the lower bound
	check can be omitted.  If the condition also says
	"(i>=0)&&(i<array.length)", the abc removal can occur.

	So, a complete solution to the problem of abc removal would be
	the following: for each array access, build a system of
	equations containing:

		[1] all the symbolic variable definitions

		[2] the complete entry condition of the BB in which
		the array access occurs

		[3] the two "goal functions" ("index >=0" and "index <
		array.length")

	If the system is valid for *each possible* variable value, then the goal
	functions are always true, and the abc can be removed.

	All this discussion is useful to give a precise specification
	to the problem we are trying to solve.

	The trouble is that, in the general case, the resulting system
	of equations is like a predicate in first order logic, which
	is semi-decidable, and its general solution is anyway too
	complex to be attempted in a JIT compiler (which should not
	contain a full fledged theorem prover).

	Therefore, we must cut some corner.

	There is also another big problem, which is caused by
	"recursive" symbolic definitions. These definition can (and
	generally do) happen every time there is a loop. For instance,
	in the following piece of code:

		for ( int i = 0; i < array.length; i++ ) {
			Console.WriteLine( "array [i] = " + array [i] );
		}

	one of the definitions of i is a PHI that can be either 0 or
	"i + 1".

	Now, we know that mathematically "i = i + 1" does not make
	sense, and in fact symbolic values are not "equations", they
	are "symbolic definitions".

	The actual symbolic value of i is a generic "n", where "n" is
	the number of iterations of the loop, but this is terrible to
	handle (and in more complex examples the symbolic value of i
	simply cannot be written, because i is calculated in an
	iterative way).

	However, the definition "i = i + 1" tells us something about
	i: it tells us that i "grows". So (from the PHI definition) we
	know that i is either 0, or "grows". This is enough to tell
	that "i>=0", which is what we want!  It is important to note
	that recursive definitions can only occurr inside PHI
	definitions, because actually a variable cannot be defined
	*only* in terms of itself!

	At this point, I can explain which corners I want to cut to
	make the problem solvable. It will not remove all the abc that
	could theoretically be removed, but at least it will work.

	The easiest way to cut corners is to only handle expressions
	which are "reasonably simple", and ignore the rest.

	Keep in mind that ignoring an expression is not harmful in
	itself.  The algorithm will be simply "less powerful", because
	it will ignore conditions that could have caused to the
	removal of an abc, but will not remove checks "by mistake" (so
	the resulting code will be in any case correct).

	The expressions we handle are the following (all of integer
	type):

		- constant
		- variable
		- variable + constant
		- constant + variable
		- variable - constant

	And, of course, PHI definitions.

	Any other expression causes the introduction of an "any" value
	in the evaluation, which makes all values that depend from it
	unknown as well.

	We will call these kind of definitions "summarizable"
	definitions.

	In a first attempt, we can consider only branch conditions
	that have the simplest possible form (the comparison of two
	summarizable expressions).

	We can also simplify the effect of variable definitions,
	keeping only what is relevant to know: their value range with
	respect to zero and with respect to the length of the array we
	are currently handling.

	One particular note on PHI functions: they work (obviously)
	like the logical "or" of their definitions, and therefore are
	equivalent to the "logical or" of the summarization of their
	definitions.

	About recursive definitions (which, believe me, are the worst
	thing in all this mess), we handle only "monotonic" ones. That
	is, we try to understand if the recursive definition (which,
	as we said above, must happen because of a loop) always
	"grows" or "gets smaller". In all other cases, we decide we
	cannot handle it.

	One critical thing, once we have defined all these data
	structures, is how the evaluation is actually performed.

	In a first attempt I coded a "brute force" approach, which for
	each BB tried to examine all possible conditions between all
	variables, filling a sort of "evaluation matrix". The problem
	was that the complexity of this evaluation was quadratic (or
	worse) on the number of variables, and that many wariables
	were examined even if they were not involved in any array
	access.

	Following the ABCD paper:

		  http://citeseer.ist.psu.edu/bodik00abcd.html

	I rewrote the algorithm in a more "sparse" way.

	Now, the main data structure is a graph of relations between
	variables, and each attempt to remove a check performs a
	traversal of the graph, looking for a path from the index to
	the array length that satisfies the properties "index >= 0"
	and "index < length". If such a path is found, the check is
	removed. It is true that in theory *each* traversal has a
	complexity which is exponential on the number of variables,
	but in practice the graph is not very connected, so the
	traversal terminates quickly.


	Then, the algorithm to optimize one method looks like this:

		[1] Preparation:

		    [1a] Build the SSA representation.

		    [1b] Prepare the evaluation graph (empty)

		    [1b] Summarize each varible definition, and put
		         the resulting relations in the evaluation
		         graph

		[2] Analyze each BB, starting from the entry point and
		    following the dominator tree:

		    [2a] Summarize its entry condition, and put the resulting relations
		         in the evaluation graph (this is the reason
			 why the BBs are examined following the
			 dominator tree, so that conditions are added
			 to the graph in a "cumulative" way)

		    [2b] Scan the BB instructions, and for each array
			 access perform step [3]

		    [2c] Process children BBs following the dominator
			 tree (step [2])

		    [2d] Remove from the evaluation area the conditions added at step [2a]
		         (so that backtracking along the tree the area
			 is properly cleared)

		[3] Attempt the removal:

		    [3a] Summarize the index expression, to see if we can handle it; there
		         are three cases: the index is either a
			 constant, or a variable (with an optional
			 delta) or cannot be handled (is a "any")

		    [3b] If the index can be handled, traverse the evaluation area searching
		         a path from the index variable to the array
			 length (if the index is a constant, just
			 examine the array length to see if it has
			 some relation with this constant)

		    [3c] Use the results of step [3b] to decide if the check is redundant