Imported Upstream version 5.18.0.167

Former-commit-id: 289509151e0fee68a1b591a20c9f109c3c789d3a
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2018-10-20 08:25:10 +00:00
parent e19d552987
commit b084638f15
28489 changed files with 184 additions and 3866856 deletions

View File

@ -1,74 +0,0 @@
Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
From: Vikram Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <lattner@cs.uiuc.edu>
Subject: a few thoughts
I've been mulling over the virtual machine problem and I had some
thoughts about some things for us to think about discuss:
1. We need to be clear on our goals for the VM. Do we want to emphasize
portability and safety like the Java VM? Or shall we focus on the
architecture interface first (i.e., consider the code generation and
processor issues), since the architecture interface question is also
important for portable Java-type VMs?
This is important because the audiences for these two goals are very
different. Architects and many compiler people care much more about
the second question. The Java compiler and OS community care much more
about the first one.
Also, while the architecture interface question is important for
Java-type VMs, the design constraints are very different.
2. Design issues to consider (an initial list that we should continue
to modify). Note that I'm not trying to suggest actual solutions here,
but just various directions we can pursue:
a. A single-assignment VM, which we've both already been thinking about.
b. A strongly-typed VM. One question is do we need the types to be
explicitly declared or should they be inferred by the dynamic compiler?
c. How do we get more high-level information into the VM while keeping
to a low-level VM design?
o Explicit array references as operands? An alternative is
to have just an array type, and let the index computations be
separate 3-operand instructions.
o Explicit instructions to handle aliasing, e.g.s:
-- an instruction to say "I speculate that these two values are not
aliased, but check at runtime", like speculative execution in
EPIC?
-- or an instruction to check whether two values are aliased and
execute different code depending on the answer, somewhat like
predicated code in EPIC
o (This one is a difficult but powerful idea.)
A "thread-id" field on every instruction that allows the static
compiler to generate a set of parallel threads, and then have
the runtime compiler and hardware do what they please with it.
This has very powerful uses, but thread-id on every instruction
is expensive in terms of instruction size and code size.
We would need to compactly encode it somehow.
Also, this will require some reading on at least two other
projects:
-- Multiscalar architecture from Wisconsin
-- Simultaneous multithreading architecture from Washington
o Or forget all this and stick to a traditional instruction set?
BTW, on an unrelated note, after the meeting yesterday, I did remember
that you had suggested doing instruction scheduling on SSA form instead
of a dependence DAG earlier in the semester. When we talked about
it yesterday, I didn't remember where the idea had come from but I
remembered later. Just giving credit where its due...
Perhaps you can save the above as a file under RCS so you and I can
continue to expand on this.
--Vikram

View File

@ -1,199 +0,0 @@
Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
From: Chris Lattner <sabre@nondot.org>
To: Vikram Adve <vadve@cs.uiuc.edu>
Subject: Re: a few thoughts
Okay... here are a few of my thoughts on this (it's good to know that we
think so alike!):
> 1. We need to be clear on our goals for the VM. Do we want to emphasize
> portability and safety like the Java VM? Or shall we focus on the
> architecture interface first (i.e., consider the code generation and
> processor issues), since the architecture interface question is also
> important for portable Java-type VMs?
I forsee the architecture looking kinda like this: (which is completely
subject to change)
1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it
basically impossible to support C like languages. Besides that,
certifying a register based language as safe at run time would be a
pretty expensive operation to have to do. Additionally, we would like
to be able to statically eliminate many bounds checks in Java
programs... for example.
2. Instead, we can do the following (eventually):
* Java bytecode is used as our "safe" representation (to avoid
reinventing something that we don't add much value to). When the
user chooses to execute Java bytecodes directly (ie, not
precompiled) the runtime compiler can do some very simple
transformations (JIT style) to convert it into valid input for our
VM. Performance is not wonderful, but it works right.
* The file is scheduled to be compiled (rigorously) at a later
time. This could be done by some background process or by a second
processor in the system during idle time or something...
* To keep things "safe" ie to enforce a sandbox on Java/foreign code,
we could sign the generated VM code with a host specific private
key. Then before the code is executed/loaded, we can check to see if
the trusted compiler generated the code. This would be much quicker
than having to validate consistency (especially if bounds checks have
been removed, for example)
> This is important because the audiences for these two goals are very
> different. Architects and many compiler people care much more about
> the second question. The Java compiler and OS community care much more
> about the first one.
3. By focusing on a more low level virtual machine, we have much more room
for value add. The nice safe "sandbox" VM can be provided as a layer
on top of it. It also lets us focus on the more interesting compilers
related projects.
> 2. Design issues to consider (an initial list that we should continue
> to modify). Note that I'm not trying to suggest actual solutions here,
> but just various directions we can pursue:
Understood. :)
> a. A single-assignment VM, which we've both already been thinking
> about.
Yup, I think that this makes a lot of sense. I am still intrigued,
however, by the prospect of a minimally allocated VM representation... I
think that it could have definite advantages for certain applications
(think very small machines, like PDAs). I don't, however, think that our
initial implementations should focus on this. :)
Here are some other auxiliary goals that I think we should consider:
1. Primary goal: Support a high performance dynamic compilation
system. This means that we have an "ideal" division of labor between
the runtime and static compilers. Of course, the other goals of the
system somewhat reduce the importance of this point (f.e. portability
reduces performance, but hopefully not much)
2. Portability to different processors. Since we are most familiar with
x86 and solaris, I think that these two are excellent candidates when
we get that far...
3. Support for all languages & styles of programming (general purpose
VM). This is the point that disallows java style bytecodes, where all
array refs are checked for bounds, etc...
4. Support linking between different language families. For example, call
C functions directly from Java without using the nasty/slow/gross JNI
layer. This involves several subpoints:
A. Support for languages that require garbage collectors and integration
with languages that don't. As a base point, we could insist on
always using a conservative GC, but implement free as a noop, f.e.
> b. A strongly-typed VM. One question is do we need the types to be
> explicitly declared or should they be inferred by the dynamic
> compiler?
B. This is kind of similar to another idea that I have: make OOP
constructs (virtual function tables, class heirarchies, etc) explicit
in the VM representation. I believe that the number of additional
constructs would be fairly low, but would give us lots of important
information... something else that would/could be important is to
have exceptions as first class types so that they would be handled in
a uniform way for the entire VM... so that C functions can call Java
functions for example...
> c. How do we get more high-level information into the VM while keeping
> to a low-level VM design?
> o Explicit array references as operands? An alternative is
> to have just an array type, and let the index computations be
> separate 3-operand instructions.
C. In the model I was thinking of (subject to change of course), we
would just have an array type (distinct from the pointer
types). This would allow us to have arbitrarily complex index
expressions, while still distinguishing "load" from "Array load",
for example. Perhaps also, switch jump tables would be first class
types as well? This would allow better reasoning about the program.
5. Support dynamic loading of code from various sources. Already
mentioned above was the example of loading java bytecodes, but we want
to support dynamic loading of VM code as well. This makes the job of
the runtime compiler much more interesting: it can do interprocedural
optimizations that the static compiler can't do, because it doesn't
have all of the required information (for example, inlining from
shared libraries, etc...)
6. Define a set of generally useful annotations to add to the VM
representation. For example, a function can be analysed to see if it
has any sideeffects when run... also, the MOD/REF sets could be
calculated, etc... we would have to determine what is reasonable. This
would generally be used to make IP optimizations cheaper for the
runtime compiler...
> o Explicit instructions to handle aliasing, e.g.s:
> -- an instruction to say "I speculate that these two values are not
> aliased, but check at runtime", like speculative execution in
> EPIC?
> -- or an instruction to check whether two values are aliased and
> execute different code depending on the answer, somewhat like
> predicated code in EPIC
These are also very good points... if this can be determined at compile
time. I think that an epic style of representation (not the instruction
packing, just the information presented) could be a very interesting model
to use... more later...
> o (This one is a difficult but powerful idea.)
> A "thread-id" field on every instruction that allows the static
> compiler to generate a set of parallel threads, and then have
> the runtime compiler and hardware do what they please with it.
> This has very powerful uses, but thread-id on every instruction
> is expensive in terms of instruction size and code size.
> We would need to compactly encode it somehow.
Yes yes yes! :) I think it would be *VERY* useful to include this kind
of information (which EPIC architectures *implicitly* encode. The trend
that we are seeing supports this greatly:
1. Commodity processors are getting massive SIMD support:
* Intel/Amd MMX/MMX2
* AMD's 3Dnow!
* Intel's SSE/SSE2
* Sun's VIS
2. SMP is becoming much more common, especially in the server space.
3. Multiple processors on a die are right around the corner.
If nothing else, not designing this in would severely limit our future
expansion of the project...
> Also, this will require some reading on at least two other
> projects:
> -- Multiscalar architecture from Wisconsin
> -- Simultaneous multithreading architecture from Washington
>
> o Or forget all this and stick to a traditional instruction set?
Heh... :) Well, from a pure research point of view, it is almost more
attactive to go with the most extreme/different ISA possible. On one axis
you get safety and conservatism, and on the other you get degree of
influence that the results have. Of course the problem with pure research
is that often times there is no concrete product of the research... :)
> BTW, on an unrelated note, after the meeting yesterday, I did remember
> that you had suggested doing instruction scheduling on SSA form instead
> of a dependence DAG earlier in the semester. When we talked about
> it yesterday, I didn't remember where the idea had come from but I
> remembered later. Just giving credit where its due...
:) Thanks.
> Perhaps you can save the above as a file under RCS so you and I can
> continue to expand on this.
I think it makes sense to do so when we get our ideas more formalized and
bounce it back and forth a couple of times... then I'll do a more formal
writeup of our goals and ideas. Obviously our first implementation will
not want to do all of the stuff that I pointed out above... be we will
want to design the project so that we do not artificially limit ourselves
at sometime in the future...
Anyways, let me know what you think about these ideas... and if they sound
reasonable...
-Chris

View File

@ -1,30 +0,0 @@
From: Chris Lattner [mailto:sabre@nondot.org]
Sent: Wednesday, December 06, 2000 6:41 PM
To: Vikram S. Adve
Subject: Additional idea with respect to encoding
Here's another idea with respect to keeping the common case instruction
size down (less than 32 bits ideally):
Instead of encoding an instruction to operate on two register numbers,
have it operate on two negative offsets based on the current register
number. Therefore, instead of using:
r57 = add r55, r56 (r57 is the implicit dest register, of course)
We could use:
r57 = add -2, -1
My guess is that most SSA references are to recent values (especially if
they correspond to expressions like (x+y*z+p*q/ ...), so the negative
numbers would tend to stay small, even at the end of the procedure (where
the implicit register destination number could be quite large). Of course
the negative sign is reduntant, so you would be storing small integers
almost all of the time, and 5-6 bits worth of register number would be
plenty for most cases...
What do you think?
-Chris

View File

@ -1,83 +0,0 @@
SUMMARY
-------
We met to discuss the LLVM instruction format and bytecode representation:
ISSUES RESOLVED
---------------
1. We decided that we shall use a flat namespace to represent our
variables in SSA form, as opposed to having a two dimensional namespace
of the original variable and the SSA instance subscript.
ARGUMENT AGAINST:
* A two dimensional namespace would be valuable when doing alias
analysis because the extra information can help limit the scope of
analysis.
ARGUMENT FOR:
* Including this information would require that all users of the LLVM
bytecode would have to parse and handle it. This would slow down the
common case and inflate the instruction representation with another
infinite variable space.
REASONING:
* It was decided that because original variable sources could be
reconstructed from SSA form in linear time, that it would be an
unjustified expense for the common case to include the extra
information for one optimization. Alias analysis itself is typically
greater than linear in asymptotic complexity, so this extra analaysis
would not affect the runtime of the optimization in a significant
way. Additionally, this would be an unlikely optimization to do at
runtime.
IDEAS TO CONSIDER
-----------------
1. Including dominator information in the LLVM bytecode
representation. This is one example of an analysis result that may be
packaged with the bytecodes themselves. As a conceptual implementation
idea, we could include an immediate dominator number for each basic block
in the LLVM bytecode program. Basic blocks could be numbered according
to the order of occurrence in the bytecode representation.
2. Including loop header and body information. This would facilitate
detection of intervals and natural loops.
UNRESOLVED ISSUES
-----------------
1. Will oSUIF provide enough of an infrastructure to support the research
that we will be doing? We know that it has less than stellar
performance, but hope that this will be of little importance for our
static compiler. This could affect us if we decided to do some IP
research. Also we do not yet understand the level of exception support
currently implemented.
2. Should we consider the requirements of a direct hardware implementation
of the LLVM when we design it? If so, several design issues should
have their priorities shifted. The other option is to focus on a
software layer interpreting the LLVM in all cases.
3. Should we use some form of packetized format to improve forward
compatibility? For example, we could design the system to encode a
packet type and length field before analysis information, to allow a
runtime to skip information that it didn't understand in a bytecode
stream. The obvious benefit would be for compatibility, the drawback
is that it would tend to splinter that 'standard' LLVM definition.
4. Should we use fixed length instructions or variable length
instructions? Fetching variable length instructions is expensive (for
either hardware or software based LLVM runtimes), but we have several
'infinite' spaces that instructions operate in (SSA register numbers,
type spaces, or packet length [if packets were implemented]). Several
options were mentioned including:
A. Using 16 or 32 bit numbers, which would be 'big enough'
B. A scheme similar to how UTF-8 works, to encode infinite numbers
while keeping small number small.
C. Use something similar to Huffman encoding, so that the most common
numbers are the smallest.
-Chris

View File

@ -1,39 +0,0 @@
Date: Wed, 31 Jan 2001 12:04:33 -0600
From: Vikram S. Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <lattner@cs.uiuc.edu>
Subject: another thought
I have a budding idea about making LLVM a little more ambitious: a
customizable runtime system that can be used to implement language-specific
virtual machines for many different languages. E.g., a C vm, a C++ vm, a
Java vm, a Lisp vm, ..
The idea would be that LLVM would provide a standard set of runtime features
(some low-level like standard assembly instructions with code generation and
static and runtime optimization; some higher-level like type-safety and
perhaps a garbage collection library). Each language vm would select the
runtime features needed for that language, extending or customizing them as
needed. Most of the machine-dependent code-generation and optimization
features as well as low-level machine-independent optimizations (like PRE)
could be provided by LLVM and should be sufficient for any language,
simplifying the language compiler. (This would also help interoperability
between languages.) Also, some or most of the higher-level
machine-independent features like type-safety and access safety should be
reusable by different languages, with minor extensions. The language
compiler could then focus on language-specific analyses and optimizations.
The risk is that this sounds like a universal IR -- something that the
compiler community has tried and failed to develop for decades, and is
universally skeptical about. No matter what we say, we won't be able to
convince anyone that we have a universal IR that will work. We need to
think about whether LLVM is different or if has something novel that might
convince people. E.g., the idea of providing a package of separable
features that different languages select from. Also, using SSA with or
without type-safety as the intermediate representation.
One interesting starting point would be to discuss how a JVM would be
implemented on top of LLVM a bit more. That might give us clues on how to
structure LLVM to support one or more language VMs.
--Vikram

View File

@ -1,67 +0,0 @@
Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: Type notation debate...
This is the way that I am currently planning on implementing types:
Primitive Types:
type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
Method:
typelist ::= typelisth | /*empty*/
typelisth ::= type | typelisth ',' type
type ::= type (typelist)
Arrays (without and with size):
type ::= '[' type ']' | '[' INT ',' type ']'
Pointer:
type ::= type '*'
Structure:
type ::= '{' typelist '}'
Packed:
type ::= '<' INT ',' type '>'
Simple examples:
[[ %4, int ]] - array of (array of 4 (int))
[ { int, int } ] - Array of structure
[ < %4, int > ] - Array of 128 bit SIMD packets
int (int, [[int, %4]]) - Method taking a 2d array and int, returning int
Okay before you comment, please look at:
http://www.research.att.com/~bs/devXinterview.html
Search for "In another interview, you defined the C declarator syntax as
an experiment that failed. However, this syntactic construct has been
around for 27 years and perhaps more; why do you consider it problematic
(except for its cumbersome syntax)?" and read that response for me. :)
Now with this syntax, his example would be represented as:
[ %10, bool (int, int) * ] *
vs
bool (*(*)[10])(int, int)
in C.
Basically, my argument for this type construction system is that it is
VERY simple to use and understand (although it IS different than C, it is
very simple and straightforward, which C is NOT). In fact, I would assert
that most programmers TODAY do not understand pointers to member
functions, and have to look up an example when they have to write them.
In my opinion, it is critically important to have clear and concise type
specifications, because types are going to be all over the programs.
Let me know your thoughts on this. :)
-Chris

View File

@ -1,75 +0,0 @@
Date: Thu, 8 Feb 2001 08:42:04 -0600
From: Vikram S. Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <sabre@nondot.org>
Subject: RE: Type notation debate...
Chris,
> Okay before you comment, please look at:
>
> http://www.research.att.com/~bs/devXinterview.html
I read this argument. Even before that, I was already in agreement with you
and him that the C declarator syntax is difficult and confusing.
But in fact, if you read the entire answer carefully, he came to the same
conclusion I do: that you have to go with familiar syntax over logical
syntax because familiarity is such a strong force:
"However, familiarity is a strong force. To compare, in English, we
live
more or less happily with the absurd rules for "to be" (am, are, is, been,
was, were, ...) and all attempts to simplify are treated with contempt or
(preferably) humor. It be a curious world and it always beed."
> Basically, my argument for this type construction system is that it is
> VERY simple to use and understand (although it IS different than C, it is
> very simple and straightforward, which C is NOT). In fact, I would assert
> that most programmers TODAY do not understand pointers to member
> functions, and have to look up an example when they have to write them.
Again, I don't disagree with this at all. But to some extent this
particular problem is inherently difficult. Your syntax for the above
example may be easier for you to read because this is the way you have been
thinking about it. Honestly, I don't find it much easier than the C syntax.
In either case, I would have to look up an example to write pointers to
member functions.
But pointers to member functions are nowhere near as common as arrays. And
the old array syntax:
type [ int, int, ...]
is just much more familiar and clear to people than anything new you
introduce, no matter how logical it is. Introducing a new syntax that may
make function pointers easier but makes arrays much more difficult seems
very risky to me.
> In my opinion, it is critically important to have clear and concise type
> specifications, because types are going to be all over the programs.
I absolutely agree. But the question is, what is more clear and concise?
The syntax programmers are used to out of years of experience or a new
syntax that they have never seen that has a more logical structure. I think
the answer is the former. Sometimes, you have to give up a better idea
because you can't overcome sociological barriers to it. Qwerty keyboards
and Windows are two classic examples of bad technology that are difficult to
root out.
P.S. Also, while I agree that most your syntax is more logical, there is
one part that isn't:
Arrays (without and with size):
type ::= '[' type ']' | '[' INT ',' type ']'.
The arrays with size lists the dimensions and the type in a single list.
That is just too confusing:
[10, 40, int]
This seems to be a 3-D array where the third dimension is something strange.
It is too confusing to have a list of 3 things, some of which are dimensions
and one is a type. Either of the following would be better:
array [10, 40] of int
or
int [10, 40]
--Vikram

View File

@ -1,53 +0,0 @@
Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: RE: Type notation debate...
> Arrays (without and with size):
> type ::= '[' type ']' | '[' INT ',' type ']'.
>
> The arrays with size lists the dimensions and the type in a single list.
> That is just too confusing:
> [10, 40, int]
> This seems to be a 3-D array where the third dimension is something strange.
> It is too confusing to have a list of 3 things, some of which are dimensions
> and one is a type.
The above grammar indicates that there is only one integer parameter, ie
the upper bound. The lower bound is always implied to be zero, for
several reasons:
* As a low level VM, we want to expose addressing computations
explicitly. Since the lower bound must always be known in a high level
language statically, the language front end can do the translation
automatically.
* This fits more closely with what Java needs, ie what we need in the
short term. Java arrays are always zero based.
If a two element list is too confusing, I would recommend an alternate
syntax of:
type ::= '[' type ']' | '[' INT 'x' type ']'.
For example:
[12 x int]
[12x int]
[ 12 x [ 4x int ]]
Which is syntactically nicer, and more explicit.
> Either of the following would be better:
> array [10, 40] of int
I considered this approach for arrays in general (ie array of int/ array
of 12 int), but found that it made declarations WAY too long. Remember
that because of the nature of llvm, you get a lot of types strewn all over
the program, and using the 'typedef' like facility is not a wonderful
option, because then types aren't explicit anymore.
I find this email interesting, because you contradict the previous email
you sent, where you recommend that we stick to C syntax....
-Chris

View File

@ -1,89 +0,0 @@
> But in fact, if you read the entire answer carefully, he came to the same
> conclusion I do: that you have to go with familiar syntax over logical
> syntax because familiarity is such a strong force:
> "However, familiarity is a strong force. To compare, in English, we
live
> more or less happily with the absurd rules for "to be" (am, are, is, been,
> was, were, ...) and all attempts to simplify are treated with contempt or
> (preferably) humor. It be a curious world and it always beed."
Although you have to remember that his situation was considerably
different than ours. He was in a position where he was designing a high
level language that had to be COMPATIBLE with C. Our language is such
that a new person would have to learn the new, different, syntax
anyways. Making them learn about the type system does not seem like much
of a stretch from learning the opcodes and how SSA form works, and how
everything ties together...
> > Basically, my argument for this type construction system is that it is
> > VERY simple to use and understand (although it IS different than C, it is
> > very simple and straightforward, which C is NOT). In fact, I would assert
> > that most programmers TODAY do not understand pointers to member
> > functions, and have to look up an example when they have to write them.
> Again, I don't disagree with this at all. But to some extent this
> particular problem is inherently difficult. Your syntax for the above
> example may be easier for you to read because this is the way you have been
> thinking about it. Honestly, I don't find it much easier than the C syntax.
> In either case, I would have to look up an example to write pointers to
> member functions.
I would argue that because the lexical structure of the language is self
consistent, any person who spent a significant amount of time programming
in LLVM directly would understand how to do it without looking it up in a
manual. The reason this does not work for C is because you rarely have to
declare these pointers, and the syntax is inconsistent with the method
declaration and calling syntax.
> But pointers to member functions are nowhere near as common as arrays.
Very true. If you're implementing an object oriented language, however,
remember that you have to do all the pointer to member function stuff
yourself.... so every time you invoke a virtual method one is involved
(instead of having C++ hide it for you behind "syntactic sugar").
> And the old array syntax:
> type [ int, int, ...]
> is just much more familiar and clear to people than anything new you
> introduce, no matter how logical it is.
Erm... excuse me but how is this the "old array syntax"? If you are
arguing for consistency with C, you should be asking for 'type int []',
which is significantly different than the above (beside the above
introduces a new operator and duplicates information
needlessly). Basically what I am suggesting is exactly the above without
the fluff. So instead of:
type [ int, int, ...]
you use:
type [ int ]
> Introducing a new syntax that may
> make function pointers easier but makes arrays much more difficult seems
> very risky to me.
This is not about function pointers. This is about consistency in the
type system, and consistency with the rest of the language. The point
above does not make arrays any more difficult to use, and makes the
structure of types much more obvious than the "c way".
> > In my opinion, it is critically important to have clear and concise type
> > specifications, because types are going to be all over the programs.
>
> I absolutely agree. But the question is, what is more clear and concise?
> The syntax programmers are used to out of years of experience or a new
> syntax that they have never seen that has a more logical structure. I think
> the answer is the former. Sometimes, you have to give up a better idea
> because you can't overcome sociological barriers to it. Qwerty keyboards
> and Windows are two classic examples of bad technology that are difficult to
> root out.
Very true, but you seem to be advocating a completely different Type
system than C has, in addition to it not offering the advantages of clear
structure that the system I recommended does... so you seem to not have a
problem with changing this, just with what I change it to. :)
-Chris

View File

@ -1,120 +0,0 @@
Ok, here are my comments and suggestions about the LLVM instruction set.
We should discuss some now, but can discuss many of them later, when we
revisit synchronization, type inference, and other issues.
(We have discussed some of the comments already.)
o We should consider eliminating the type annotation in cases where it is
essentially obvious from the instruction type, e.g., in br, it is obvious
that the first arg. should be a bool and the other args should be labels:
br bool <cond>, label <iftrue>, label <iffalse>
I think your point was that making all types explicit improves clarity
and readability. I agree to some extent, but it also comes at the cost
of verbosity. And when the types are obvious from people's experience
(e.g., in the br instruction), it doesn't seem to help as much.
o On reflection, I really like your idea of having the two different switch
types (even though they encode implementation techniques rather than
semantics). It should simplify building the CFG and my guess is it could
enable some significant optimizations, though we should think about which.
o In the lookup-indirect form of the switch, is there a reason not to make
the val-type uint? Most HLL switch statements (including Java and C++)
require that anyway. And it would also make the val-type uniform
in the two forms of the switch.
I did see the switch-on-bool examples and, while cute, we can just use
the branch instructions in that particular case.
o I agree with your comment that we don't need 'neg'.
o There's a trade-off with the cast instruction:
+ it avoids having to define all the upcasts and downcasts that are
valid for the operands of each instruction (you probably have thought
of other benefits also)
- it could make the bytecode significantly larger because there could
be a lot of cast operations
o Making the second arg. to 'shl' a ubyte seems good enough to me.
255 positions seems adequate for several generations of machines
and is more compact than uint.
o I still have some major concerns about including malloc and free in the
language (either as builtin functions or instructions). LLVM must be
able to represent code from many different languages. Languages such as
C, C++ Java and Fortran 90 would not be able to use our malloc anyway
because each of them will want to provide a library implementation of it.
This gets even worse when code from different languages is linked
into a single executable (which is fairly common in large apps).
Having a single malloc would just not suffice, and instead would simply
complicate the picture further because it adds an extra variant in
addition to the one each language provides.
Instead, providing a default library version of malloc and free
(and perhaps a malloc_gc with garbage collection instead of free)
would make a good implementation available to anyone who wants it.
I don't recall all your arguments in favor so let's discuss this again,
and soon.
o 'alloca' on the other hand sounds like a good idea, and the
implementation seems fairly language-independent so it doesn't have the
problems with malloc listed above.
o About indirect call:
Your option #2 sounded good to me. I'm not sure I understand your
concern about an explicit 'icall' instruction?
o A pair of important synchronization instr'ns to think about:
load-linked
store-conditional
o Other classes of instructions that are valuable for pipeline performance:
conditional-move
predicated instructions
o I believe tail calls are relatively easy to identify; do you know why
.NET has a tailcall instruction?
o I agree that we need a static data space. Otherwise, emulating global
data gets unnecessarily complex.
o About explicit parallelism:
We once talked about adding a symbolic thread-id field to each
instruction. (It could be optional so single-threaded codes are
not penalized.) This could map well to multi-threaded architectures
while providing easy ILP for single-threaded onces. But it is probably
too radical an idea to include in a base version of LLVM. Instead, it
could a great topic for a separate study.
What is the semantics of the IA64 stop bit?
o And finally, another thought about the syntax for arrays :-)
Although this syntax:
array <dimension-list> of <type>
is verbose, it will be used only in the human-readable assembly code so
size should not matter. I think we should consider it because I find it
to be the clearest syntax. It could even make arrays of function
pointers somewhat readable.

View File

@ -1,245 +0,0 @@
From: Chris Lattner <sabre@nondot.org>
To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
Subject: Re: LLVM Feedback
I've included your feedback in the /home/vadve/lattner/llvm/docs directory
so that it will live in CVS eventually with the rest of LLVM. I've
significantly updated the documentation to reflect the changes you
suggested, as specified below:
> We should consider eliminating the type annotation in cases where it is
> essentially obvious from the instruction type:
> br bool <cond>, label <iftrue>, label <iffalse>
> I think your point was that making all types explicit improves clarity
> and readability. I agree to some extent, but it also comes at the
> cost of verbosity. And when the types are obvious from people's
> experience (e.g., in the br instruction), it doesn't seem to help as
> much.
Very true. We should discuss this more, but my reasoning is more of a
consistency argument. There are VERY few instructions that can have all
of the types eliminated, and doing so when available unnecessarily makes
the language more difficult to handle. Especially when you see 'int
%this' and 'bool %that' all over the place, I think it would be
disorienting to see:
br %predicate, %iftrue, %iffalse
for branches. Even just typing that once gives me the creeps. ;) Like I
said, we should probably discuss this further in person...
> On reflection, I really like your idea of having the two different
> switch types (even though they encode implementation techniques rather
> than semantics). It should simplify building the CFG and my guess is it
> could enable some significant optimizations, though we should think
> about which.
Great. I added a note to the switch section commenting on how the VM
should just use the instruction type as a hint, and that the
implementation may choose altermate representations (such as predicated
branches).
> In the lookup-indirect form of the switch, is there a reason not to
> make the val-type uint?
No. This was something I was debating for a while, and didn't really feel
strongly about either way. It is common to switch on other types in HLL's
(for example signed int's are particularly common), but in this case, all
that will be added is an additional 'cast' instruction. I removed that
from the spec.
> I agree with your comment that we don't need 'neg'
Removed.
> There's a trade-off with the cast instruction:
> + it avoids having to define all the upcasts and downcasts that are
> valid for the operands of each instruction (you probably have
> thought of other benefits also)
> - it could make the bytecode significantly larger because there could
> be a lot of cast operations
+ You NEED casts to represent things like:
void foo(float);
...
int x;
...
foo(x);
in a language like C. Even in a Java like language, you need upcasts
and some way to implement dynamic downcasts.
+ Not all forms of instructions take every type (for example you can't
shift by a floating point number of bits), thus SOME programs will need
implicit casts.
To be efficient and to avoid your '-' point above, we just have to be
careful to specify that the instructions shall operate on all common
types, therefore casting should be relatively uncommon. For example all
of the arithmetic operations work on almost all data types.
> Making the second arg. to 'shl' a ubyte seems good enough to me.
> 255 positions seems adequate for several generations of machines
Okay, that comment is removed.
> and is more compact than uint.
No, it isn't. Remember that the bytecode encoding saves value slots into
the bytecode instructions themselves, not constant values. This is
another case where we may introduce more cast instructions (but we will
also reduce the number of opcode variants that must be supported by a
virtual machine). Because most shifts are by constant values, I don't
think that we'll have to cast many shifts. :)
> I still have some major concerns about including malloc and free in the
> language (either as builtin functions or instructions).
Agreed. How about this proposal:
malloc/free are either built in functions or actual opcodes. They provide
all of the type safety that the document would indicate, blah blah
blah. :)
Now, because of all of the excellent points that you raised, an
implementation may want to override the default malloc/free behavior of
the program. To do this, they simply implement a "malloc" and
"free" function. The virtual machine will then be defined to use the user
defined malloc/free function (which return/take void*'s, not type'd
pointers like the builtin function would) if one is available, otherwise
fall back on a system malloc/free.
Does this sound like a good compromise? It would give us all of the
typesafety/elegance in the language while still allowing the user to do
all the cool stuff they want to...
> 'alloca' on the other hand sounds like a good idea, and the
> implementation seems fairly language-independent so it doesn't have the
> problems with malloc listed above.
Okay, once we get the above stuff figured out, I'll put it all in the
spec.
> About indirect call:
> Your option #2 sounded good to me. I'm not sure I understand your
> concern about an explicit 'icall' instruction?
I worry too much. :) The other alternative has been removed. 'icall' is
now up in the instruction list next to 'call'.
> I believe tail calls are relatively easy to identify; do you know why
> .NET has a tailcall instruction?
Although I am just guessing, I believe it probably has to do with the fact
that they want languages like Haskell and lisp to be efficiently runnable
on their VM. Of course this means that the VM MUST implement tail calls
'correctly', or else life will suck. :) I would put this into a future
feature bin, because it could be pretty handy...
> A pair of important synchronization instr'ns to think about:
> load-linked
> store-conditional
What is 'load-linked'? I think that (at least for now) I should add these
to the 'possible extensions' section, because they are not immediately
needed...
> Other classes of instructions that are valuable for pipeline
> performance:
> conditional-move
> predicated instructions
Conditional move is effectly a special case of a predicated
instruction... and I think that all predicated instructions can possibly
be implemented later in LLVM. It would significantly change things, and
it doesn't seem to be very necessary right now. It would seem to
complicate flow control analysis a LOT in the virtual machine. I would
tend to prefer that a predicated architecture like IA64 convert from a
"basic block" representation to a predicated rep as part of it's dynamic
complication phase. Also, if a basic block contains ONLY a move, then
that can be trivally translated into a conditional move...
> I agree that we need a static data space. Otherwise, emulating global
> data gets unnecessarily complex.
Definitely. Also a later item though. :)
> We once talked about adding a symbolic thread-id field to each
> ..
> Instead, it could a great topic for a separate study.
Agreed. :)
> What is the semantics of the IA64 stop bit?
Basically, the IA64 writes instructions like this:
mov ...
add ...
sub ...
op xxx
op xxx
;;
mov ...
add ...
sub ...
op xxx
op xxx
;;
Where the ;; delimits a group of instruction with no dependencies between
them, which can all be executed concurrently (to the limits of the
available functional units). The ;; gets translated into a bit set in one
of the opcodes.
The advantages of this representation is that you don't have to do some
kind of 'thread id scheduling' pass by having to specify ahead of time how
many threads to use, and the representation doesn't have a per instruction
overhead...
> And finally, another thought about the syntax for arrays :-)
> Although this syntax:
> array <dimension-list> of <type>
> is verbose, it will be used only in the human-readable assembly code so
> size should not matter. I think we should consider it because I find it
> to be the clearest syntax. It could even make arrays of function
> pointers somewhat readable.
My only comment will be to give you an example of why this is a bad
idea. :)
Here is an example of using the switch statement (with my recommended
syntax):
switch uint %val, label %otherwise,
[%3 x {uint, label}] [ { uint %57, label %l1 },
{ uint %20, label %l2 },
{ uint %14, label %l3 } ]
Here it is with the syntax you are proposing:
switch uint %val, label %otherwise,
array %3 of {uint, label}
array of {uint, label}
{ uint %57, label %l1 },
{ uint %20, label %l2 },
{ uint %14, label %l3 }
Which is ambiguous and very verbose. It would be possible to specify
constants with [] brackets as in my syntax, which would look like this:
switch uint %val, label %otherwise,
array %3 of {uint, label} [ { uint %57, label %l1 },
{ uint %20, label %l2 },
{ uint %14, label %l3 } ]
But then the syntax is inconsistent between type definition and constant
definition (why do []'s enclose the constants but not the types??).
Anyways, I'm sure that there is much debate still to be had over
this... :)
-Chris
http://www.nondot.org/~sabre/os/
http://www.nondot.org/MagicStats/
http://korbit.sourceforge.net/

View File

@ -1,39 +0,0 @@
Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: LLVM Concerns...
I've updated the documentation to include load store and allocation
instructions (please take a look and let me know if I'm on the right
track):
file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
I have a couple of concerns I would like to bring up:
1. Reference types
Right now, I've spec'd out the language to have a pointer type, which
works fine for lots of stuff... except that Java really has
references: constrained pointers that cannot be manipulated: added and
subtracted, moved, etc... Do we want to have a type like this? It
could be very nice for analysis (pointer always points to the start of
an object, etc...) and more closely matches Java semantics. The
pointer type would be kept for C++ like semantics. Through analysis,
C++ pointers could be promoted to references in the LLVM
representation.
2. Our "implicit" memory references in assembly language:
After thinking about it, this model has two problems:
A. If you do pointer analysis and realize that two stores are
independent and can share the same memory source object, there is
no way to represent this in either the bytecode or assembly.
B. When parsing assembly/bytecode, we effectively have to do a full
SSA generation/PHI node insertion pass to build the dependencies
when we don't want the "pinned" representation. This is not
cool.
I'm tempted to make memory references explicit in both the assembly and
bytecode to get around this... what do you think?
-Chris

View File

@ -1,47 +0,0 @@
Date: Tue, 13 Feb 2001 18:25:42 -0600
From: Vikram S. Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <sabre@nondot.org>
Subject: RE: LLVM Concerns...
> 1. Reference types
> Right now, I've spec'd out the language to have a pointer type, which
> works fine for lots of stuff... except that Java really has
> references: constrained pointers that cannot be manipulated: added and
> subtracted, moved, etc... Do we want to have a type like this? It
> could be very nice for analysis (pointer always points to the start of
> an object, etc...) and more closely matches Java semantics. The
> pointer type would be kept for C++ like semantics. Through analysis,
> C++ pointers could be promoted to references in the LLVM
> representation.
You're right, having references would be useful. Even for C++ the *static*
compiler could generate references instead of pointers with fairly
straightforward analysis. Let's include a reference type for now. But I'm
also really concerned that LLVM is becoming big and complex and (perhaps)
too high-level. After we get some initial performance results, we may have
a clearer idea of what our goals should be and we should revisit this
question then.
> 2. Our "implicit" memory references in assembly language:
> After thinking about it, this model has two problems:
> A. If you do pointer analysis and realize that two stores are
> independent and can share the same memory source object,
not sure what you meant by "share the same memory source object"
> there is
> no way to represent this in either the bytecode or assembly.
> B. When parsing assembly/bytecode, we effectively have to do a full
> SSA generation/PHI node insertion pass to build the dependencies
> when we don't want the "pinned" representation. This is not
> cool.
I understand the concern. But again, let's focus on the performance first
and then look at the language design issues. E.g., it would be good to know
how big the bytecode files are before expanding them further. I am pretty
keen to explore the implications of LLVM for mobile devices. Both bytecode
size and power consumption are important to consider there.
--Vikram

View File

@ -1,49 +0,0 @@
By Chris:
LLVM has been designed with two primary goals in mind. First we strive to
enable the best possible division of labor between static and dynamic
compilers, and second, we need a flexible and powerful interface
between these two complementary stages of compilation. We feel that
providing a solution to these two goals will yield an excellent solution
to the performance problem faced by modern architectures and programming
languages.
A key insight into current compiler and runtime systems is that a
compiler may fall in anywhere in a "continuum of compilation" to do its
job. On one side, scripting languages statically compile nothing and
dynamically compile (or equivalently, interpret) everything. On the far
other side, traditional static compilers process everything statically and
nothing dynamically. These approaches have typically been seen as a
tradeoff between performance and portability. On a deeper level, however,
there are two reasons that optimal system performance may be obtained by a
system somewhere in between these two extremes: Dynamic application
behavior and social constraints.
From a technical perspective, pure static compilation cannot ever give
optimal performance in all cases, because applications have varying dynamic
behavior that the static compiler cannot take into consideration. Even
compilers that support profile guided optimization generate poor code in
the real world, because using such optimization tunes that application
to one particular usage pattern, whereas real programs (as opposed to
benchmarks) often have several different usage patterns.
On a social level, static compilation is a very shortsighted solution to
the performance problem. Instruction set architectures (ISAs) continuously
evolve, and each implementation of an ISA (a processor) must choose a set
of tradeoffs that make sense in the market context that it is designed for.
With every new processor introduced, the vendor faces two fundamental
problems: First, there is a lag time between when a processor is introduced
to when compilers generate quality code for the architecture. Secondly,
even when compilers catch up to the new architecture there is often a large
body of legacy code that was compiled for previous generations and will
not or can not be upgraded. Thus a large percentage of code running on a
processor may be compiled quite sub-optimally for the current
characteristics of the dynamic execution environment.
For these reasons, LLVM has been designed from the beginning as a long-term
solution to these problems. Its design allows the large body of platform
independent, static, program optimizations currently in compilers to be
reused unchanged in their current form. It also provides important static
type information to enable powerful dynamic and link time optimizations
to be performed quickly and efficiently. This combination enables an
increase in effective system performance for real world environments.

View File

@ -1,202 +0,0 @@
Meeting notes: Implementation idea: Exception Handling in C++/Java
The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
We decided that the best solution requires a set of library calls provided by
the VM, as well as an extension to the LLVM function invocation syntax.
The LLVM function invocation instruction previously looks like this (ignoring
types):
call func(arg1, arg2, arg3)
The extension discussed today adds an optional "with" clause that
associates a label with the call site. The new syntax looks like this:
call func(arg1, arg2, arg3) with funcCleanup
This funcHandler always stays tightly associated with the call site (being
encoded directly into the call opcode itself), and should be used whenever
there is cleanup work that needs to be done for the current function if
an exception is thrown by func (or if we are in a try block).
To support this, the VM/Runtime provide the following simple library
functions (all syntax in this document is very abstract):
typedef struct { something } %frame;
The VM must export a "frame type", that is an opaque structure used to
implement different types of stack walking that may be used by various
language runtime libraries. We imagine that it would be typical to
represent a frame with a PC and frame pointer pair, although that is not
required.
%frame getStackCurrentFrame();
Get a frame object for the current function. Note that if the current
function was inlined into its caller, the "current" frame will belong to
the "caller".
bool isFirstFrame(%frame f);
Returns true if the specified frame is the top level (first activated) frame
for this thread. For the main thread, this corresponds to the main()
function, for a spawned thread, it corresponds to the thread function.
%frame getNextFrame(%frame f);
Return the previous frame on the stack. This function is undefined if f
satisfies the predicate isFirstFrame(f).
Label *getFrameLabel(%frame f);
If a label was associated with f (as discussed below), this function returns
it. Otherwise, it returns a null pointer.
doNonLocalBranch(Label *L);
At this point, it is not clear whether this should be a function or
intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with
this issue later.
Here is a motivating example that illustrates how these facilities could be
used to implement the C++ exception model:
void TestFunction(...) {
A a; B b;
foo(); // Any function call may throw
bar();
C c;
try {
D d;
baz();
} catch (int) {
...int Stuff...
// execution continues after the try block: the exception is consumed
} catch (double) {
...double stuff...
throw; // Exception is propogated
}
}
This function would compile to approximately the following code (heavy
pseudo code follows):
Func:
%a = alloca A
A::A(%a) // These ctors & dtors could throw, but we ignore this
%b = alloca B // minor detail for this example
B::B(%b)
call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
call bar() with barCleanup // An exception in bar is propogated to barCleanup
%c = alloca C
C::C(c)
%d = alloca D
D::D(d)
call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
d->~D();
EndTry: // This label corresponds to the end of the try block
c->~C() // These could also throw, these are also ignored
b->~B()
a->~A()
return
Note that this is a very straight forward and literal translation: exactly
what we want for zero cost (when unused) exception handling. Especially on
platforms with many registers (ie, the IA64) setjmp/longjmp style exception
handling is *very* impractical. Also, the "with" clauses describe the
control flow paths explicitly so that analysis is not adversly effected.
The foo/barCleanup labels are implemented as:
TryCleanup: // Executed if an exception escapes the try block
c->~C()
barCleanup: // Executed if an exception escapes from bar()
// fall through
fooCleanup: // Executed if an exception escapes from foo()
b->~B()
a->~A()
Exception *E = getThreadLocalException()
call throw(E) // Implemented by the C++ runtime, described below
Which does the work one would expect. getThreadLocalException is a function
implemented by the C++ support library. It returns the current exception
object for the current thread. Note that we do not attempt to recycle the
shutdown code from before, because performance of the mainline code is
critically important. Also, obviously fooCleanup and barCleanup may be
merged and one of them eliminated. This just shows how the code generator
would most likely emit code.
The bazCleanup label is more interesting. Because the exception may be caught
by the try block, we must dispatch to its handler... but it does not exist
on the call stack (it does not have a VM Call->Label mapping installed), so
we must dispatch statically with a goto. The bazHandler thus appears as:
bazHandler:
d->~D(); // destruct D as it goes out of scope when entering catch clauses
goto TryHandler
In general, TryHandler is not the same as bazHandler, because multiple
function calls could be made from the try block. In this case, trivial
optimization could merge the two basic blocks. TryHandler is the code
that actually determines the type of exception, based on the Exception object
itself. For this discussion, assume that the exception object contains *at
least*:
1. A pointer to the RTTI info for the contained object
2. A pointer to the dtor for the contained object
3. The contained object itself
Note that it is necessary to maintain #1 & #2 in the exception object itself
because objects without virtual function tables may be thrown (as in this
example). Assuming this, TryHandler would look something like this:
TryHandler:
Exception *E = getThreadLocalException();
switch (E->RTTIType) {
case IntRTTIInfo:
...int Stuff... // The action to perform from the catch block
break;
case DoubleRTTIInfo:
...double Stuff... // The action to perform from the catch block
goto TryCleanup // This catch block rethrows the exception
break; // Redundant, eliminated by the optimizer
default:
goto TryCleanup // Exception not caught, rethrow
}
// Exception was consumed
if (E->dtor)
E->dtor(E->object) // Invoke the dtor on the object if it exists
goto EndTry // Continue mainline code...
And that is all there is to it.
The throw(E) function would then be implemented like this (which may be
inlined into the caller through standard optimization):
function throw(Exception *E) {
// Get the start of the stack trace...
%frame %f = call getStackCurrentFrame()
// Get the label information that corresponds to it
label * %L = call getFrameLabel(%f)
while (%L == 0 && !isFirstFrame(%f)) {
// Loop until a cleanup handler is found
%f = call getNextFrame(%f)
%L = call getFrameLabel(%f)
}
if (%L != 0) {
call setThreadLocalException(E) // Allow handlers access to this...
call doNonLocalBranch(%L)
}
// No handler found!
call BlowUp() // Ends up calling the terminate() method in use
}
That's a brief rundown of how C++ exception handling could be implemented in
llvm. Java would be very similar, except it only uses destructors to unlock
synchronized blocks, not to destroy data. Also, it uses two stack walks: a
nondestructive walk that builds a stack trace, then a destructive walk that
unwinds the stack as shown here.
It would be trivial to get exception interoperability between C++ and Java.

View File

@ -1,45 +0,0 @@
Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: RE: Meeting writeup
> I read it through and it looks great!
Thanks!
> The finally clause in Java may need more thought. The code for this clause
> is like a subroutine because it needs to be entered from many points (end of
> try block and beginning of each catch block), and then needs to *return to
> the place from where the code was entered*. That's why JVM has the
> jsr/jsr_w instruction.
Hrm... I guess that is an implementation decision. It can either be
modelled as a subroutine (as java bytecodes do), which is really
gross... or it can be modelled as code duplication (emitted once inline,
then once in the exception path). Because this could, at worst,
slightly less than double the amount of code in a function (it is
bounded) I don't think this is a big deal. One of the really nice things
about the LLVM representation is that it still allows for runtime code
generation for exception paths (exceptions paths are not compiled until
needed). Obviously a static compiler couldn't do this though. :)
In this case, only one copy of the code would be compiled... until the
other one is needed on demand. Also this strategy fits with the "zero
cost" exception model... the standard case is not burdened with extra
branches or "call"s.
> I suppose you could save the return address in a particular register
> (specific to this finally block), jump to the finally block, and then at the
> end of the finally block, jump back indirectly through this register. It
> will complicate building the CFG but I suppose that can be handled. It is
> also unsafe in terms of checking where control returns (which is I suppose
> why the JVM doesn't use this).
I think that a code duplication method would be cleaner, and would avoid
the caveats that you mention. Also, it does not slow down the normal case
with an indirect branch...
Like everything, we can probably defer a final decision until later. :)
-Chris

View File

@ -1,63 +0,0 @@
Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: Interesting: GCC passes
Take a look at this document (which describes the order of optimizations
that GCC performs):
http://gcc.gnu.org/onlinedocs/gcc_17.html
The rundown is that after RTL generation, the following happens:
1 . [t] jump optimization (jumps to jumps, etc)
2 . [t] Delete unreachable code
3 . Compute live ranges for CSE
4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
5 . [t] CSE
6 . *** Conversion to SSA
7 . [t] SSA Based DCE
8 . *** Conversion to LLVM
9 . UnSSA
10. GCSE
11. LICM
12. Strength Reduction
13. Loop unrolling
14. [t] CSE
15. [t] DCE
16. Instruction combination, register movement, scheduling... etc.
I've marked optimizations with a [t] to indicate things that I believe to
be relatively trivial to implement in LLVM itself. The time consuming
things to reimplement would be SSA based PRE, Strength reduction & loop
unrolling... these would be the major things we would miss out on if we
did LLVM creation from tree code [inlining and other high level
optimizations are done on the tree representation].
Given the lack of "strong" optimizations that would take a long time to
reimplement, I am leaning a bit more towards creating LLVM from the tree
code. Especially given that SGI has GPL'd their compiler, including many
SSA based optimizations that could be adapted (besides the fact that their
code looks MUCH nicer than GCC :)
Even if we choose to do LLVM code emission from RTL, we will almost
certainly want to move LLVM emission from step 8 down until at least CSE
has been rerun... which causes me to wonder if the SSA generation code
will still work (due to global variable dependencies and stuff). I assume
that it can be made to work, but might be a little more involved than we
would like.
I'm continuing to look at the Tree -> RTL code. It is pretty gross
because they do some of the translation a statement at a time, and some
of it a function at a time... I'm not quite clear why and how the
distinction is drawn, but it does not appear that there is a wonderful
place to attach extra info.
Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We
can talk about this more on Monday.
Wouldn't it be nice if there were a obvious decision to be made? :)
-Chris

View File

@ -1,71 +0,0 @@
Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
From: Chris Lattner <sabre@nondot.org>
To: Vikram S. Adve <vadve@cs.uiuc.edu>
Subject: RE: Interesting: GCC passes
> That is very interesting. I agree that some of these could be done on LLVM
> at link-time, but it is the extra time required that concerns me. Link-time
> optimization is severely time-constrained.
If we were to reimplement any of these optimizations, I assume that we
could do them a translation unit at a time, just as GCC does now. This
would lead to a pipeline like this:
Static optimizations, xlation unit at a time:
.c --GCC--> .llvm --llvmopt--> .llvm
Link time optimizations:
.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm
Of course, many optimizations could be shared between llvmopt and
llvm-link-opt, but the wouldn't need to be shared... Thus compile time
could be faster, because we are using a "smarter" IR (SSA based).
> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
> putting it into another is not necessarily easier than re-doing it.
> Optimization code is usually heavily tied in to the specific IR they use.
Understood. The only reason that I brought this up is because SGI's IR is
more similar to LLVM than it is different in many respects (SSA based,
relatively low level, etc), and could be easily adapted. Also their
optimizations are written in C++ and are actually somewhat
structured... of course it would be no walk in the park, but it would be
much less time consuming to adapt, say, SSA-PRE than to rewrite it.
> But your larger point is valid that adding SSA based optimizations is
> feasible and should be fun. (Again, link time cost is the issue.)
Assuming linktime cost wasn't an issue, the question is:
Does using GCC's backend buy us anything?
> It also occurs to me that GCC is probably doing quite a bit of back-end
> optimization (step 16 in your list). Do you have a breakdown of that?
Not really. The irritating part of GCC is that it mixes it all up and
doesn't have a clean separation of concerns. A lot of the "back end
optimization" happens right along with other data optimizations (ie, CSE
of machine specific things).
As far as REAL back end optimizations go, it looks something like this:
1. Instruction combination: try to make CISCy instructions, if available
2. Register movement: try to get registers in the right places for the
architecture to avoid register to register moves. For example, try to get
the first argument of a function to naturally land in %o0 for sparc.
3. Instruction scheduling: 'nuff said :)
4. Register class preferencing: ??
5. Local register allocation
6. global register allocation
7. Spilling
8. Local regalloc
9. Jump optimization
10. Delay slot scheduling
11. Branch shorting for CISC machines
12. Instruction selection & peephole optimization
13. Debug info output
But none of this would be usable for LLVM anyways, unless we were using
GCC as a static compiler.
-Chris

View File

@ -1,30 +0,0 @@
Date: Wed, 20 Jun 2001 12:32:22 -0500
From: Vikram Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <lattner@cs.uiuc.edu>
Subject: .NET vs. our VM
One significant difference between .NET CLR and our VM is that the CLR
includes full information about classes and inheritance. In fact, I just
sat through the paper on adding templates to .NET CLR, and the speaker
indicated that the goal seems to be to do simple static compilation (very
little lowering or optimization). Also, the templates implementation in CLR
"relies on dynamic class loading and JIT compilation".
This is an important difference because I think there are some significant
advantages to have a much lower level VM layer, and do significant static
analysis and optimization.
I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he
said that SGI and other commercial compilers have included options to export
their *IR* next to the object code (i.e., .il files) and use them for
link-time code generation. In fact, he said that the .o file was nearly
empty and was entirely generated from the .il at link-time. But he agreed
that this limited the link-time interprocedural optimization to modules
compiled by the same compiler, whereas our approach allows us to link and
optimize modules from multiple different compilers. (Also, of course, they
don't do anything for runtime optimization).
All issues to bring up in Related Work.
--Vikram

View File

@ -1,31 +0,0 @@
Date: Fri, 6 Jul 2001 16:56:56 -0500
From: Vikram S. Adve <vadve@cs.uiuc.edu>
To: Chris Lattner <lattner@cs.uiuc.edu>
Subject: lowering the IR
BTW, I do think that we should consider lowering the IR as you said. I
didn't get time to raise it today, but it comes up with the SPARC
move-conditional instruction. I don't think we want to put that in the core
VM -- it is a little too specialized. But without a corresponding
conditional move instruction in the VM, it is pretty difficult to maintain a
close mapping between VM and machine code. Other architectures may have
other such instructions.
What I was going to suggest was that for a particular processor, we define
additional VM instructions that match some of the unusual opcodes on the
processor but have VM semantics otherwise, i.e., all operands are in SSA
form and typed. This means that we can re-generate core VM code from the
more specialized code any time we want (so that portability is not lost).
Typically, a static compiler like gcc would generate just the core VM, which
is relatively portable. Anyone (an offline tool, the linker, etc., or even
the static compiler itself if it chooses) can transform that into more
specialized target-specific VM code for a particular architecture. If the
linker does it, it can do it after all machine-independent optimizations.
This would be the most convenient, but not necessary.
The main benefit of lowering will be that we will be able to retain a close
mapping between VM and machine code.
--Vikram

Some files were not shown because too many files have changed in this diff Show More