You've already forked linux-packaging-mono
							
							
		
			
				
	
	
		
			125 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			125 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Just some thoughts for the JITer:
 | |
| 
 | |
| General issues:
 | |
| ===============
 | |
| 
 | |
| We are designing a JIT compiler, so we have to consider two things:
 | |
| 
 | |
| - the quality of the generated code
 | |
| - the time needed to generate that code
 | |
| 
 | |
| The current approach is to keep the JITer as simple as possible, and thus as
 | |
| fast as possible. The generated code quality will suffer from that.
 | |
| 
 | |
| Register Allocation:
 | |
| ====================
 | |
| 
 | |
| With lcc you can assign a fixed register to a tree before register
 | |
| allocation. For example this is needed by call, which return the value always
 | |
| in EAX on x86. The current implementation works without such system, due to
 | |
| special forest generation.
 | |
| 
 | |
| Different Register Sets:
 | |
| ========================
 | |
| 
 | |
| Most processors have more that one register set, at least one for floating
 | |
| point values, and one for integers. Should we support architectures with more
 | |
| that two sets? Does someone knows such an architecture?
 | |
| 
 | |
| 64bit Integer Values:
 | |
| =====================
 | |
| 
 | |
| I can imagine two different implementation. On possibility would be to treat
 | |
| long (64bit) values simply like any other value type. This implies that we
 | |
| call class methods for ALU operations like add or sub. Sure, this method will
 | |
| be be a bit inefficient.
 | |
| 
 | |
| The more performant solution is to allocate two 32bit registers for each 64bit
 | |
| value. We add a new non terminal to the monoburg grammar called "lreg". The
 | |
| register allocation routines takes care of this non terminal and allocates two
 | |
| 32 bit registers for them.
 | |
| 
 | |
| Forest generation:
 | |
| ==================
 | |
| 
 | |
| Consider the following code: 
 | |
| 
 | |
| OPCODE:		STACK		LOCALS
 | |
| LDLOC.0		(5)		[5,0]
 | |
| LDC.1		(5,1)		[5,0]
 | |
| STLOC.0		(5)		[1,0]
 | |
| STLOC.1		()		[1,5]
 | |
| 
 | |
| A simple forest generation generates: 
 | |
| 
 | |
| STLOC.0(LDC.1)
 | |
| STLOC.1(LDLOC.0)
 | |
| 
 | |
| Which is wrong, since it stores the wrong value (1 instead of 5). Instead we
 | |
| must generate something like:
 | |
| 
 | |
| STLOC.TMP(LDLOC.0)
 | |
| STLOC.0(LDC.1)
 | |
| STLOC.1(LDLOC.TMP)
 | |
| 
 | |
| Where STLOC.TMP saves the value into a new temporary variable. 
 | |
| 
 | |
| We also need a similar solution for basic block boundaries when the stack depth
 | |
| is not zero. We can simply save those values to new temporary values. Consider
 | |
| the following basic block with one instruction:
 | |
| 
 | |
| LDLOC.1 
 | |
| This should generate a tree like: 
 | |
| 
 | |
| STLOC.TMP(LDLOC.1) Please notice that an intelligent register allocator can
 | |
| still allocate registers for those new variables.
 | |
| 
 | |
| DAG handling:
 | |
| =============
 | |
| 
 | |
| Monoburg can't handle DAGs, instead we need real trees as input for
 | |
| the code generator. So we have two problems:
 | |
| 
 | |
| 1.) DUP instruction: This one is obvious - we need to store the value
 | |
| into a temporary variable to solve the problem.
 | |
| 
 | |
| 2.) function calls: Chapter 12.8, page 343 of "A retargetable C compiler"
 | |
| explains that: "because listing a call node will give it a hidden reference
 | |
| from the code list". I don't understand that (can someone explain that?), but
 | |
| there is another reason to save return values to temporaries: Consider the
 | |
| following code:
 | |
| 
 | |
| x = f(y) + g(z); // all functions return integers
 | |
| 
 | |
| We could generate such a tree for this expression: STLOC(ADD(CALL,CALL))
 | |
| 
 | |
| The problem is that both calls returns the value in the same register,
 | |
| so it is non trivial to generate code for that tree. We must copy one
 | |
| register into another one, which make register allocation more complex.
 | |
| The easier solution is store the result of function calls to
 | |
| temporaries. This leads to the following forest:
 | |
| 
 | |
| STLOC(CALL)
 | |
| STLOC(CALL)
 | |
| STLOC(ADD (LDLOC, LDLOC))
 | |
| 
 | |
| This is what lcc is doing, if I understood 12.8, page 342, 343?
 | |
| 
 | |
| Possible Optimisations:
 | |
| =======================
 | |
| 
 | |
| Miguel said ORP does some optimisation on IL level, for example moving array
 | |
| bounds checking out of loops:
 | |
| 
 | |
| for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
 | |
| 
 | |
| id transformed to:
 | |
| 
 | |
| if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; }  
 | |
| else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
 | |
| 
 | |
| The "else" is only to keep original semantics (exception handling).
 | |
| 
 | |
| We need loop detection logic in order to implement this (dominator tree).
 | |
| 
 | |
| AFAIK CACAO also implements this. |