You've already forked linux-packaging-mono
Imported Upstream version 5.18.0.167
Former-commit-id: 289509151e0fee68a1b591a20c9f109c3c789d3a
This commit is contained in:
parent
e19d552987
commit
b084638f15
465
external/llvm/docs/tutorial/LangImpl09.rst
vendored
465
external/llvm/docs/tutorial/LangImpl09.rst
vendored
@ -1,465 +0,0 @@
|
||||
======================================
|
||||
Kaleidoscope: Adding Debug Information
|
||||
======================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
Chapter 9 Introduction
|
||||
======================
|
||||
|
||||
Welcome to Chapter 9 of the "`Implementing a language with
|
||||
LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a
|
||||
decent little programming language with functions and variables.
|
||||
What happens if something goes wrong though, how do you debug your
|
||||
program?
|
||||
|
||||
Source level debugging uses formatted data that helps a debugger
|
||||
translate from binary and the state of the machine back to the
|
||||
source that the programmer wrote. In LLVM we generally use a format
|
||||
called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
|
||||
that represents types, source locations, and variable locations.
|
||||
|
||||
The short summary of this chapter is that we'll go through the
|
||||
various things you have to add to a programming language to
|
||||
support debug info, and how you translate that into DWARF.
|
||||
|
||||
Caveat: For now we can't debug via the JIT, so we'll need to compile
|
||||
our program down to something small and standalone. As part of this
|
||||
we'll make a few modifications to the running of the language and
|
||||
how programs are compiled. This means that we'll have a source file
|
||||
with a simple program written in Kaleidoscope rather than the
|
||||
interactive JIT. It does involve a limitation that we can only
|
||||
have one "top level" command at a time to reduce the number of
|
||||
changes necessary.
|
||||
|
||||
Here's the sample program we'll be compiling:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def fib(x)
|
||||
if x < 3 then
|
||||
1
|
||||
else
|
||||
fib(x-1)+fib(x-2);
|
||||
|
||||
fib(10)
|
||||
|
||||
|
||||
Why is this a hard problem?
|
||||
===========================
|
||||
|
||||
Debug information is a hard problem for a few different reasons - mostly
|
||||
centered around optimized code. First, optimization makes keeping source
|
||||
locations more difficult. In LLVM IR we keep the original source location
|
||||
for each IR level instruction on the instruction. Optimization passes
|
||||
should keep the source locations for newly created instructions, but merged
|
||||
instructions only get to keep a single location - this can cause jumping
|
||||
around when stepping through optimized programs. Secondly, optimization
|
||||
can move variables in ways that are either optimized out, shared in memory
|
||||
with other variables, or difficult to track. For the purposes of this
|
||||
tutorial we're going to avoid optimization (as you'll see with one of the
|
||||
next sets of patches).
|
||||
|
||||
Ahead-of-Time Compilation Mode
|
||||
==============================
|
||||
|
||||
To highlight only the aspects of adding debug information to a source
|
||||
language without needing to worry about the complexities of JIT debugging
|
||||
we're going to make a few changes to Kaleidoscope to support compiling
|
||||
the IR emitted by the front end into a simple standalone program that
|
||||
you can execute, debug, and see results.
|
||||
|
||||
First we make our anonymous function that contains our top level
|
||||
statement be our "main":
|
||||
|
||||
.. code-block:: udiff
|
||||
|
||||
- auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
|
||||
+ auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());
|
||||
|
||||
just with the simple change of giving it a name.
|
||||
|
||||
Then we're going to remove the command line code wherever it exists:
|
||||
|
||||
.. code-block:: udiff
|
||||
|
||||
@@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
|
||||
/// top ::= definition | external | expression | ';'
|
||||
static void MainLoop() {
|
||||
while (1) {
|
||||
- fprintf(stderr, "ready> ");
|
||||
switch (CurTok) {
|
||||
case tok_eof:
|
||||
return;
|
||||
@@ -1184,7 +1183,6 @@ int main() {
|
||||
BinopPrecedence['*'] = 40; // highest.
|
||||
|
||||
// Prime the first token.
|
||||
- fprintf(stderr, "ready> ");
|
||||
getNextToken();
|
||||
|
||||
Lastly we're going to disable all of the optimization passes and the JIT so
|
||||
that the only thing that happens after we're done parsing and generating
|
||||
code is that the LLVM IR goes to standard error:
|
||||
|
||||
.. code-block:: udiff
|
||||
|
||||
@@ -1108,17 +1108,8 @@ static void HandleExtern() {
|
||||
static void HandleTopLevelExpression() {
|
||||
// Evaluate a top-level expression into an anonymous function.
|
||||
if (auto FnAST = ParseTopLevelExpr()) {
|
||||
- if (auto *FnIR = FnAST->codegen()) {
|
||||
- // We're just doing this to make sure it executes.
|
||||
- TheExecutionEngine->finalizeObject();
|
||||
- // JIT the function, returning a function pointer.
|
||||
- void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);
|
||||
-
|
||||
- // Cast it to the right type (takes no arguments, returns a double) so we
|
||||
- // can call it as a native function.
|
||||
- double (*FP)() = (double (*)())(intptr_t)FPtr;
|
||||
- // Ignore the return value for this.
|
||||
- (void)FP;
|
||||
+ if (!F->codegen()) {
|
||||
+ fprintf(stderr, "Error generating code for top level expr");
|
||||
}
|
||||
} else {
|
||||
// Skip token for error recovery.
|
||||
@@ -1439,11 +1459,11 @@ int main() {
|
||||
// target lays out data structures.
|
||||
TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
|
||||
OurFPM.add(new DataLayoutPass());
|
||||
+#if 0
|
||||
OurFPM.add(createBasicAliasAnalysisPass());
|
||||
// Promote allocas to registers.
|
||||
OurFPM.add(createPromoteMemoryToRegisterPass());
|
||||
@@ -1218,7 +1210,7 @@ int main() {
|
||||
OurFPM.add(createGVNPass());
|
||||
// Simplify the control flow graph (deleting unreachable blocks, etc).
|
||||
OurFPM.add(createCFGSimplificationPass());
|
||||
-
|
||||
+ #endif
|
||||
OurFPM.doInitialization();
|
||||
|
||||
// Set the global so the code gen can use this.
|
||||
|
||||
This relatively small set of changes get us to the point that we can compile
|
||||
our piece of Kaleidoscope language down to an executable program via this
|
||||
command line:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
Kaleidoscope-Ch9 < fib.ks | & clang -x ir -
|
||||
|
||||
which gives an a.out/a.exe in the current working directory.
|
||||
|
||||
Compile Unit
|
||||
============
|
||||
|
||||
The top level container for a section of code in DWARF is a compile unit.
|
||||
This contains the type and function data for an individual translation unit
|
||||
(read: one file of source code). So the first thing we need to do is
|
||||
construct one for our fib.ks file.
|
||||
|
||||
DWARF Emission Setup
|
||||
====================
|
||||
|
||||
Similar to the ``IRBuilder`` class we have a
|
||||
`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
|
||||
that helps in constructing debug metadata for an LLVM IR file. It
|
||||
corresponds 1:1 similarly to ``IRBuilder`` and LLVM IR, but with nicer names.
|
||||
Using it does require that you be more familiar with DWARF terminology than
|
||||
you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
|
||||
read through the general documentation on the
|
||||
`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
|
||||
should be a little more clear. We'll be using this class to construct all
|
||||
of our IR level descriptions. Construction for it takes a module so we
|
||||
need to construct it shortly after we construct our module. We've left it
|
||||
as a global static variable to make it a bit easier to use.
|
||||
|
||||
Next we're going to create a small container to cache some of our frequent
|
||||
data. The first will be our compile unit, but we'll also write a bit of
|
||||
code for our one type since we won't have to worry about multiple typed
|
||||
expressions:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
static DIBuilder *DBuilder;
|
||||
|
||||
struct DebugInfo {
|
||||
DICompileUnit *TheCU;
|
||||
DIType *DblTy;
|
||||
|
||||
DIType *getDoubleTy();
|
||||
} KSDbgInfo;
|
||||
|
||||
DIType *DebugInfo::getDoubleTy() {
|
||||
if (DblTy)
|
||||
return DblTy;
|
||||
|
||||
DblTy = DBuilder->createBasicType("double", 64, dwarf::DW_ATE_float);
|
||||
return DblTy;
|
||||
}
|
||||
|
||||
And then later on in ``main`` when we're constructing our module:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
DBuilder = new DIBuilder(*TheModule);
|
||||
|
||||
KSDbgInfo.TheCU = DBuilder->createCompileUnit(
|
||||
dwarf::DW_LANG_C, DBuilder->createFile("fib.ks", "."),
|
||||
"Kaleidoscope Compiler", 0, "", 0);
|
||||
|
||||
There are a couple of things to note here. First, while we're producing a
|
||||
compile unit for a language called Kaleidoscope we used the language
|
||||
constant for C. This is because a debugger wouldn't necessarily understand
|
||||
the calling conventions or default ABI for a language it doesn't recognize
|
||||
and we follow the C ABI in our LLVM code generation so it's the closest
|
||||
thing to accurate. This ensures we can actually call functions from the
|
||||
debugger and have them execute. Secondly, you'll see the "fib.ks" in the
|
||||
call to ``createCompileUnit``. This is a default hard coded value since
|
||||
we're using shell redirection to put our source into the Kaleidoscope
|
||||
compiler. In a usual front end you'd have an input file name and it would
|
||||
go there.
|
||||
|
||||
One last thing as part of emitting debug information via DIBuilder is that
|
||||
we need to "finalize" the debug information. The reasons are part of the
|
||||
underlying API for DIBuilder, but make sure you do this near the end of
|
||||
main:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
DBuilder->finalize();
|
||||
|
||||
before you dump out the module.
|
||||
|
||||
Functions
|
||||
=========
|
||||
|
||||
Now that we have our ``Compile Unit`` and our source locations, we can add
|
||||
function definitions to the debug info. So in ``PrototypeAST::codegen()`` we
|
||||
add a few lines of code to describe a context for our subprogram, in this
|
||||
case the "File", and the actual definition of the function itself.
|
||||
|
||||
So the context:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
|
||||
KSDbgInfo.TheCU.getDirectory());
|
||||
|
||||
giving us an DIFile and asking the ``Compile Unit`` we created above for the
|
||||
directory and filename where we are currently. Then, for now, we use some
|
||||
source locations of 0 (since our AST doesn't currently have source location
|
||||
information) and construct our function definition:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
DIScope *FContext = Unit;
|
||||
unsigned LineNo = 0;
|
||||
unsigned ScopeLine = 0;
|
||||
DISubprogram *SP = DBuilder->createFunction(
|
||||
FContext, P.getName(), StringRef(), Unit, LineNo,
|
||||
CreateFunctionType(TheFunction->arg_size(), Unit),
|
||||
false /* internal linkage */, true /* definition */, ScopeLine,
|
||||
DINode::FlagPrototyped, false);
|
||||
TheFunction->setSubprogram(SP);
|
||||
|
||||
and we now have an DISubprogram that contains a reference to all of our
|
||||
metadata for the function.
|
||||
|
||||
Source Locations
|
||||
================
|
||||
|
||||
The most important thing for debug information is accurate source location -
|
||||
this makes it possible to map your source code back. We have a problem though,
|
||||
Kaleidoscope really doesn't have any source location information in the lexer
|
||||
or parser so we'll need to add it.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SourceLocation {
|
||||
int Line;
|
||||
int Col;
|
||||
};
|
||||
static SourceLocation CurLoc;
|
||||
static SourceLocation LexLoc = {1, 0};
|
||||
|
||||
static int advance() {
|
||||
int LastChar = getchar();
|
||||
|
||||
if (LastChar == '\n' || LastChar == '\r') {
|
||||
LexLoc.Line++;
|
||||
LexLoc.Col = 0;
|
||||
} else
|
||||
LexLoc.Col++;
|
||||
return LastChar;
|
||||
}
|
||||
|
||||
In this set of code we've added some functionality on how to keep track of the
|
||||
line and column of the "source file". As we lex every token we set our current
|
||||
current "lexical location" to the assorted line and column for the beginning
|
||||
of the token. We do this by overriding all of the previous calls to
|
||||
``getchar()`` with our new ``advance()`` that keeps track of the information
|
||||
and then we have added to all of our AST classes a source location:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
class ExprAST {
|
||||
SourceLocation Loc;
|
||||
|
||||
public:
|
||||
ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
|
||||
virtual ~ExprAST() {}
|
||||
virtual Value* codegen() = 0;
|
||||
int getLine() const { return Loc.Line; }
|
||||
int getCol() const { return Loc.Col; }
|
||||
virtual raw_ostream &dump(raw_ostream &out, int ind) {
|
||||
return out << ':' << getLine() << ':' << getCol() << '\n';
|
||||
}
|
||||
|
||||
that we pass down through when we create a new expression:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
|
||||
std::move(RHS));
|
||||
|
||||
giving us locations for each of our expressions and variables.
|
||||
|
||||
To make sure that every instruction gets proper source location information,
|
||||
we have to tell ``Builder`` whenever we're at a new source location.
|
||||
We use a small helper function for this:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
void DebugInfo::emitLocation(ExprAST *AST) {
|
||||
DIScope *Scope;
|
||||
if (LexicalBlocks.empty())
|
||||
Scope = TheCU;
|
||||
else
|
||||
Scope = LexicalBlocks.back();
|
||||
Builder.SetCurrentDebugLocation(
|
||||
DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
|
||||
}
|
||||
|
||||
This both tells the main ``IRBuilder`` where we are, but also what scope
|
||||
we're in. The scope can either be on compile-unit level or be the nearest
|
||||
enclosing lexical block like the current function.
|
||||
To represent this we create a stack of scopes:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
std::vector<DIScope *> LexicalBlocks;
|
||||
|
||||
and push the scope (function) to the top of the stack when we start
|
||||
generating the code for each function:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
KSDbgInfo.LexicalBlocks.push_back(SP);
|
||||
|
||||
Also, we may not forget to pop the scope back off of the scope stack at the
|
||||
end of the code generation for the function:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// Pop off the lexical block for the function since we added it
|
||||
// unconditionally.
|
||||
KSDbgInfo.LexicalBlocks.pop_back();
|
||||
|
||||
Then we make sure to emit the location every time we start to generate code
|
||||
for a new AST object:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
KSDbgInfo.emitLocation(this);
|
||||
|
||||
Variables
|
||||
=========
|
||||
|
||||
Now that we have functions, we need to be able to print out the variables
|
||||
we have in scope. Let's get our function arguments set up so we can get
|
||||
decent backtraces and see how our functions are being called. It isn't
|
||||
a lot of code, and we generally handle it when we're creating the
|
||||
argument allocas in ``FunctionAST::codegen``.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// Record the function arguments in the NamedValues map.
|
||||
NamedValues.clear();
|
||||
unsigned ArgIdx = 0;
|
||||
for (auto &Arg : TheFunction->args()) {
|
||||
// Create an alloca for this variable.
|
||||
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());
|
||||
|
||||
// Create a debug descriptor for the variable.
|
||||
DILocalVariable *D = DBuilder->createParameterVariable(
|
||||
SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),
|
||||
true);
|
||||
|
||||
DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
|
||||
DebugLoc::get(LineNo, 0, SP),
|
||||
Builder.GetInsertBlock());
|
||||
|
||||
// Store the initial value into the alloca.
|
||||
Builder.CreateStore(&Arg, Alloca);
|
||||
|
||||
// Add arguments to variable symbol table.
|
||||
NamedValues[Arg.getName()] = Alloca;
|
||||
}
|
||||
|
||||
|
||||
Here we're first creating the variable, giving it the scope (``SP``),
|
||||
the name, source location, type, and since it's an argument, the argument
|
||||
index. Next, we create an ``lvm.dbg.declare`` call to indicate at the IR
|
||||
level that we've got a variable in an alloca (and it gives a starting
|
||||
location for the variable), and setting a source location for the
|
||||
beginning of the scope on the declare.
|
||||
|
||||
One interesting thing to note at this point is that various debuggers have
|
||||
assumptions based on how code and debug information was generated for them
|
||||
in the past. In this case we need to do a little bit of a hack to avoid
|
||||
generating line information for the function prologue so that the debugger
|
||||
knows to skip over those instructions when setting a breakpoint. So in
|
||||
``FunctionAST::CodeGen`` we add some more lines:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// Unset the location for the prologue emission (leading instructions with no
|
||||
// location in a function are considered part of the prologue and the debugger
|
||||
// will run past them when breaking on a function)
|
||||
KSDbgInfo.emitLocation(nullptr);
|
||||
|
||||
and then emit a new location when we actually start generating code for the
|
||||
body of the function:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
KSDbgInfo.emitLocation(Body.get());
|
||||
|
||||
With this we have enough debug information to set breakpoints in functions,
|
||||
print out argument variables, and call functions. Not too bad for just a
|
||||
few simple lines of code!
|
||||
|
||||
Full Code Listing
|
||||
=================
|
||||
|
||||
Here is the complete code listing for our running example, enhanced with
|
||||
debug information. To build this example, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Compile
|
||||
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
|
||||
# Run
|
||||
./toy
|
||||
|
||||
Here is the code:
|
||||
|
||||
.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp
|
||||
:language: c++
|
||||
|
||||
`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_
|
||||
|
Reference in New Issue
Block a user