Imported Upstream version 5.18.0.167

Former-commit-id: 289509151e0fee68a1b591a20c9f109c3c789d3a
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2018-10-20 08:25:10 +00:00
parent e19d552987
commit b084638f15
28489 changed files with 184 additions and 3866856 deletions

View File

@ -1,434 +0,0 @@
=================
TableGen BackEnds
=================
.. contents::
:local:
Introduction
============
TableGen backends are at the core of TableGen's functionality. The source files
provide the semantics to a generated (in memory) structure, but it's up to the
backend to print this out in a way that is meaningful to the user (normally a
C program including a file or a textual list of warnings, options and error
messages).
TableGen is used by both LLVM and Clang with very different goals. LLVM uses it
as a way to automate the generation of massive amounts of information regarding
instructions, schedules, cores and architecture features. Some backends generate
output that is consumed by more than one source file, so they need to be created
in a way that is easy to use pre-processor tricks. Some backends can also print
C code structures, so that they can be directly included as-is.
Clang, on the other hand, uses it mainly for diagnostic messages (errors,
warnings, tips) and attributes, so more on the textual end of the scale.
LLVM BackEnds
=============
.. warning::
This document is raw. Each section below needs three sub-sections: description
of its purpose with a list of users, output generated from generic input, and
finally why it needed a new backend (in case there's something similar).
Overall, each backend will take the same TableGen file type and transform into
similar output for different targets/uses. There is an implicit contract between
the TableGen files, the back-ends and their users.
For instance, a global contract is that each back-end produces macro-guarded
sections. Based on whether the file is included by a header or a source file,
or even in which context of each file the include is being used, you have
todefine a macro just before including it, to get the right output:
.. code-block:: c++
#define GET_REGINFO_TARGET_DESC
#include "ARMGenRegisterInfo.inc"
And just part of the generated file would be included. This is useful if
you need the same information in multiple formats (instantiation, initialization,
getter/setter functions, etc) from the same source TableGen file without having
to re-compile the TableGen file multiple times.
Sometimes, multiple macros might be defined before the same include file to
output multiple blocks:
.. code-block:: c++
#define GET_REGISTER_MATCHER
#define GET_SUBTARGET_FEATURE_NAME
#define GET_MATCHER_IMPLEMENTATION
#include "ARMGenAsmMatcher.inc"
The macros will be undef'd automatically as they're used, in the include file.
On all LLVM back-ends, the ``llvm-tblgen`` binary will be executed on the root
TableGen file ``<Target>.td``, which should include all others. This guarantees
that all information needed is accessible, and that no duplication is needed
in the TableGen files.
CodeEmitter
-----------
**Purpose**: CodeEmitterGen uses the descriptions of instructions and their fields to
construct an automated code emitter: a function that, given a MachineInstr,
returns the (currently, 32-bit unsigned) value of the instruction.
**Output**: C++ code, implementing the target's CodeEmitter
class by overriding the virtual functions as ``<Target>CodeEmitter::function()``.
**Usage**: Used to include directly at the end of ``<Target>MCCodeEmitter.cpp``.
RegisterInfo
------------
**Purpose**: This tablegen backend is responsible for emitting a description of a target
register file for a code generator. It uses instances of the Register,
RegisterAliases, and RegisterClass classes to gather this information.
**Output**: C++ code with enums and structures representing the register mappings,
properties, masks, etc.
**Usage**: Both on ``<Target>BaseRegisterInfo`` and ``<Target>MCTargetDesc`` (headers
and source files) with macros defining in which they are for declaration vs.
initialization issues.
InstrInfo
---------
**Purpose**: This tablegen backend is responsible for emitting a description of the target
instruction set for the code generator. (what are the differences from CodeEmitter?)
**Output**: C++ code with enums and structures representing the instruction mappings,
properties, masks, etc.
**Usage**: Both on ``<Target>BaseInstrInfo`` and ``<Target>MCTargetDesc`` (headers
and source files) with macros defining in which they are for declaration vs.
initialization issues.
AsmWriter
---------
**Purpose**: Emits an assembly printer for the current target.
**Output**: Implementation of ``<Target>InstPrinter::printInstruction()``, among
other things.
**Usage**: Included directly into ``InstPrinter/<Target>InstPrinter.cpp``.
AsmMatcher
----------
**Purpose**: Emits a target specifier matcher for
converting parsed assembly operands in the MCInst structures. It also
emits a matcher for custom operand parsing. Extensive documentation is
written on the ``AsmMatcherEmitter.cpp`` file.
**Output**: Assembler parsers' matcher functions, declarations, etc.
**Usage**: Used in back-ends' ``AsmParser/<Target>AsmParser.cpp`` for
building the AsmParser class.
Disassembler
------------
**Purpose**: Contains disassembler table emitters for various
architectures. Extensive documentation is written on the
``DisassemblerEmitter.cpp`` file.
**Output**: Decoding tables, static decoding functions, etc.
**Usage**: Directly included in ``Disassembler/<Target>Disassembler.cpp``
to cater for all default decodings, after all hand-made ones.
PseudoLowering
--------------
**Purpose**: Generate pseudo instruction lowering.
**Output**: Implements ``<Target>AsmPrinter::emitPseudoExpansionLowering()``.
**Usage**: Included directly into ``<Target>AsmPrinter.cpp``.
CallingConv
-----------
**Purpose**: Responsible for emitting descriptions of the calling
conventions supported by this target.
**Output**: Implement static functions to deal with calling conventions
chained by matching styles, returning false on no match.
**Usage**: Used in ISelLowering and FastIsel as function pointers to
implementation returned by a CC selection function.
DAGISel
-------
**Purpose**: Generate a DAG instruction selector.
**Output**: Creates huge functions for automating DAG selection.
**Usage**: Included in ``<Target>ISelDAGToDAG.cpp`` inside the target's
implementation of ``SelectionDAGISel``.
DFAPacketizer
-------------
**Purpose**: This class parses the Schedule.td file and produces an API that
can be used to reason about whether an instruction can be added to a packet
on a VLIW architecture. The class internally generates a deterministic finite
automaton (DFA) that models all possible mappings of machine instructions
to functional units as instructions are added to a packet.
**Output**: Scheduling tables for GPU back-ends (Hexagon, AMD).
**Usage**: Included directly on ``<Target>InstrInfo.cpp``.
FastISel
--------
**Purpose**: This tablegen backend emits code for use by the "fast"
instruction selection algorithm. See the comments at the top of
lib/CodeGen/SelectionDAG/FastISel.cpp for background. This file
scans through the target's tablegen instruction-info files
and extracts instructions with obvious-looking patterns, and it emits
code to look up these instructions by type and operator.
**Output**: Generates ``Predicate`` and ``FastEmit`` methods.
**Usage**: Implements private methods of the targets' implementation
of ``FastISel`` class.
Subtarget
---------
**Purpose**: Generate subtarget enumerations.
**Output**: Enums, globals, local tables for sub-target information.
**Usage**: Populates ``<Target>Subtarget`` and
``MCTargetDesc/<Target>MCTargetDesc`` files (both headers and source).
Intrinsic
---------
**Purpose**: Generate (target) intrinsic information.
OptParserDefs
-------------
**Purpose**: Print enum values for a class.
CTags
-----
**Purpose**: This tablegen backend emits an index of definitions in ctags(1)
format. A helper script, utils/TableGen/tdtags, provides an easier-to-use
interface; run 'tdtags -H' for documentation.
X86EVEX2VEX
-----------
**Purpose**: This X86 specific tablegen backend emits tables that map EVEX
encoded instructions to their VEX encoded identical instruction.
Clang BackEnds
==============
ClangAttrClasses
----------------
**Purpose**: Creates Attrs.inc, which contains semantic attribute class
declarations for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``.
This file is included as part of ``Attr.h``.
ClangAttrParserStringSwitches
-----------------------------
**Purpose**: Creates AttrParserStringSwitches.inc, which contains
StringSwitch::Case statements for parser-related string switches. Each switch
is given its own macro (such as ``CLANG_ATTR_ARG_CONTEXT_LIST``, or
``CLANG_ATTR_IDENTIFIER_ARG_LIST``), which is expected to be defined before
including AttrParserStringSwitches.inc, and undefined after.
ClangAttrImpl
-------------
**Purpose**: Creates AttrImpl.inc, which contains semantic attribute class
definitions for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``.
This file is included as part of ``AttrImpl.cpp``.
ClangAttrList
-------------
**Purpose**: Creates AttrList.inc, which is used when a list of semantic
attribute identifiers is required. For instance, ``AttrKinds.h`` includes this
file to generate the list of ``attr::Kind`` enumeration values. This list is
separated out into multiple categories: attributes, inheritable attributes, and
inheritable parameter attributes. This categorization happens automatically
based on information in ``Attr.td`` and is used to implement the ``classof``
functionality required for ``dyn_cast`` and similar APIs.
ClangAttrPCHRead
----------------
**Purpose**: Creates AttrPCHRead.inc, which is used to deserialize attributes
in the ``ASTReader::ReadAttributes`` function.
ClangAttrPCHWrite
-----------------
**Purpose**: Creates AttrPCHWrite.inc, which is used to serialize attributes in
the ``ASTWriter::WriteAttributes`` function.
ClangAttrSpellings
---------------------
**Purpose**: Creates AttrSpellings.inc, which is used to implement the
``__has_attribute`` feature test macro.
ClangAttrSpellingListIndex
--------------------------
**Purpose**: Creates AttrSpellingListIndex.inc, which is used to map parsed
attribute spellings (including which syntax or scope was used) to an attribute
spelling list index. These spelling list index values are internal
implementation details exposed via
``AttributeList::getAttributeSpellingListIndex``.
ClangAttrVisitor
-------------------
**Purpose**: Creates AttrVisitor.inc, which is used when implementing
recursive AST visitors.
ClangAttrTemplateInstantiate
----------------------------
**Purpose**: Creates AttrTemplateInstantiate.inc, which implements the
``instantiateTemplateAttribute`` function, used when instantiating a template
that requires an attribute to be cloned.
ClangAttrParsedAttrList
-----------------------
**Purpose**: Creates AttrParsedAttrList.inc, which is used to generate the
``AttributeList::Kind`` parsed attribute enumeration.
ClangAttrParsedAttrImpl
-----------------------
**Purpose**: Creates AttrParsedAttrImpl.inc, which is used by
``AttributeList.cpp`` to implement several functions on the ``AttributeList``
class. This functionality is implemented via the ``AttrInfoMap ParsedAttrInfo``
array, which contains one element per parsed attribute object.
ClangAttrParsedAttrKinds
------------------------
**Purpose**: Creates AttrParsedAttrKinds.inc, which is used to implement the
``AttributeList::getKind`` function, mapping a string (and syntax) to a parsed
attribute ``AttributeList::Kind`` enumeration.
ClangAttrDump
-------------
**Purpose**: Creates AttrDump.inc, which dumps information about an attribute.
It is used to implement ``ASTDumper::dumpAttr``.
ClangDiagsDefs
--------------
Generate Clang diagnostics definitions.
ClangDiagGroups
---------------
Generate Clang diagnostic groups.
ClangDiagsIndexName
-------------------
Generate Clang diagnostic name index.
ClangCommentNodes
-----------------
Generate Clang AST comment nodes.
ClangDeclNodes
--------------
Generate Clang AST declaration nodes.
ClangStmtNodes
--------------
Generate Clang AST statement nodes.
ClangSACheckers
---------------
Generate Clang Static Analyzer checkers.
ClangCommentHTMLTags
--------------------
Generate efficient matchers for HTML tag names that are used in documentation comments.
ClangCommentHTMLTagsProperties
------------------------------
Generate efficient matchers for HTML tag properties.
ClangCommentHTMLNamedCharacterReferences
----------------------------------------
Generate function to translate named character references to UTF-8 sequences.
ClangCommentCommandInfo
-----------------------
Generate command properties for commands that are used in documentation comments.
ClangCommentCommandList
-----------------------
Generate list of commands that are used in documentation comments.
ArmNeon
-------
Generate arm_neon.h for clang.
ArmNeonSema
-----------
Generate ARM NEON sema support for clang.
ArmNeonTest
-----------
Generate ARM NEON tests for clang.
AttrDocs
--------
**Purpose**: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is
used for documenting user-facing attributes.
How to write a back-end
=======================
TODO.
Until we get a step-by-step HowTo for writing TableGen backends, you can at
least grab the boilerplate (build system, new files, etc.) from Clang's
r173931.
TODO: How they work, how to write one. This section should not contain details
about any particular backend, except maybe ``-print-enums`` as an example. This
should highlight the APIs in ``TableGen/Record.h``.

View File

@ -1,31 +0,0 @@
=====================
TableGen Deficiencies
=====================
.. contents::
:local:
Introduction
============
Despite being very generic, TableGen has some deficiencies that have been
pointed out numerous times. The common theme is that, while TableGen allows
you to build Domain-Specific-Languages, the final languages that you create
lack the power of other DSLs, which in turn increase considerably the size
and complexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning of
the basic concepts via custom-made back-ends, which can pervert the original
design and make it very hard for newcomers to understand it.
There are some in favour of extending the semantics even more, but making sure
back-ends adhere to strict rules. Others suggesting we should move to more
powerful DSLs designed with specific purposes, or even re-using existing
DSLs.
Known Problems
==============
TODO: Add here frequently asked questions about why TableGen doesn't do
what you want, how it might, and how we could extend/restrict it to
be more use friendly.

File diff suppressed because it is too large Load Diff

View File

@ -1,390 +0,0 @@
===========================
TableGen Language Reference
===========================
.. contents::
:local:
.. warning::
This document is extremely rough. If you find something lacking, please
fix it, file a documentation bug, or ask about it on llvm-dev.
Introduction
============
This document is meant to be a normative spec about the TableGen language
in and of itself (i.e. how to understand a given construct in terms of how
it affects the final set of records represented by the TableGen file). If
you are unsure if this document is really what you are looking for, please
read the :doc:`introduction to TableGen <index>` first.
Notation
========
The lexical and syntax notation used here is intended to imitate
`Python's`_. In particular, for lexical definitions, the productions
operate at the character level and there is no implied whitespace between
elements. The syntax definitions operate at the token level, so there is
implied whitespace between tokens.
.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation
Lexical Analysis
================
TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``)
comments.
The following is a listing of the basic punctuation tokens::
- + [ ] { } ( ) < > : ; . = ? #
Numeric literals take one of the following forms:
.. TableGen actually will lex some pretty strange sequences an interpret
them as numbers. What is shown here is an attempt to approximate what it
"should" accept.
.. productionlist::
TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger`
DecimalInteger: ["+" | "-"] ("0"..."9")+
HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+
BinInteger: "0b" ("0" | "1")+
One aspect to note is that the :token:`DecimalInteger` token *includes* the
``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
most languages do.
Also note that :token:`BinInteger` creates a value of type ``bits<n>``
(where ``n`` is the number of bits). This will implicitly convert to
integers when needed.
TableGen has identifier-like tokens:
.. productionlist::
ualpha: "a"..."z" | "A"..."Z" | "_"
TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")*
TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")*
Note that unlike most languages, TableGen allows :token:`TokIdentifier` to
begin with a number. In case of ambiguity, a token will be interpreted as a
numeric literal rather than an identifier.
TableGen also has two string-like literals:
.. productionlist::
TokString: '"' <non-'"' characters and C-like escapes> '"'
TokCodeFragment: "[{" <shortest text not containing "}]"> "}]"
:token:`TokCodeFragment` is essentially a multiline string literal
delimited by ``[{`` and ``}]``.
.. note::
The current implementation accepts the following C-like escapes::
\\ \' \" \t \n
TableGen also has the following keywords::
bit bits class code dag
def foreach defm field in
int let list multiclass string
TableGen also has "bang operators" which have a
wide variety of meanings:
.. productionlist::
BangOperator: one of
:!eq !if !head !tail !con
:!add !shl !sra !srl !and
:!or !empty !subst !foreach !strconcat
:!cast !listconcat
Syntax
======
TableGen has an ``include`` mechanism. It does not play a role in the
syntax per se, since it is lexically replaced with the contents of the
included file.
.. productionlist::
IncludeDirective: "include" `TokString`
TableGen's top-level production consists of "objects".
.. productionlist::
TableGenFile: `Object`*
Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach`
``class``\es
------------
.. productionlist::
Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
A ``class`` declaration creates a record which other records can inherit
from. A class can be parametrized by a list of "template arguments", whose
values can be used in the class body.
A given class can only be defined once. A ``class`` declaration is
considered to define the class if any of the following is true:
.. break ObjectBody into its consituents so that they are present here?
#. The :token:`TemplateArgList` is present.
#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty.
#. The :token:`BaseClassList` in the :token:`ObjectBody` is present.
You can declare an empty class by giving and empty :token:`TemplateArgList`
and an empty :token:`ObjectBody`. This can serve as a restricted form of
forward declaration: note that records deriving from the forward-declared
class will inherit no fields from it since the record expansion is done
when the record is parsed.
.. productionlist::
TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
Declarations
------------
.. Omitting mention of arcane "field" prefix to discourage its use.
The declaration syntax is pretty much what you would expect as a C++
programmer.
.. productionlist::
Declaration: `Type` `TokIdentifier` ["=" `Value`]
It assigns the value to the identifier.
Types
-----
.. productionlist::
Type: "string" | "code" | "bit" | "int" | "dag"
:| "bits" "<" `TokInteger` ">"
:| "list" "<" `Type` ">"
:| `ClassID`
ClassID: `TokIdentifier`
Both ``string`` and ``code`` correspond to the string type; the difference
is purely to indicate programmer intention.
The :token:`ClassID` must identify a class that has been previously
declared or defined.
Values
------
.. productionlist::
Value: `SimpleValue` `ValueSuffix`*
ValueSuffix: "{" `RangeList` "}"
:| "[" `RangeList` "]"
:| "." `TokIdentifier`
RangeList: `RangePiece` ("," `RangePiece`)*
RangePiece: `TokInteger`
:| `TokInteger` "-" `TokInteger`
:| `TokInteger` `TokInteger`
The peculiar last form of :token:`RangePiece` is due to the fact that the
"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as
two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``,
instead of "1", "-", and "5".
The :token:`RangeList` can be thought of as specifying "list slice" in some
contexts.
:token:`SimpleValue` has a number of forms:
.. productionlist::
SimpleValue: `TokIdentifier`
The value will be the variable referenced by the identifier. It can be one
of:
.. The code for this is exceptionally abstruse. These examples are a
best-effort attempt.
* name of a ``def``, such as the use of ``Bar`` in::
def Bar : SomeClass {
int X = 5;
}
def Foo {
SomeClass Baz = Bar;
}
* value local to a ``def``, such as the use of ``Bar`` in::
def Foo {
int Bar = 5;
int Baz = Bar;
}
* a template arg of a ``class``, such as the use of ``Bar`` in::
class Foo<int Bar> {
int Baz = Bar;
}
* value local to a ``multiclass``, such as the use of ``Bar`` in::
multiclass Foo {
int Bar = 5;
int Baz = Bar;
}
* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
multiclass Foo<int Bar> {
int Baz = Bar;
}
.. productionlist::
SimpleValue: `TokInteger`
This represents the numeric value of the integer.
.. productionlist::
SimpleValue: `TokString`+
Multiple adjacent string literals are concatenated like in C/C++. The value
is the concatenation of the strings.
.. productionlist::
SimpleValue: `TokCodeFragment`
The value is the string value of the code fragment.
.. productionlist::
SimpleValue: "?"
``?`` represents an "unset" initializer.
.. productionlist::
SimpleValue: "{" `ValueList` "}"
ValueList: [`ValueListNE`]
ValueListNE: `Value` ("," `Value`)*
This represents a sequence of bits, as would be used to initialize a
``bits<n>`` field (where ``n`` is the number of bits).
.. productionlist::
SimpleValue: `ClassID` "<" `ValueListNE` ">"
This generates a new anonymous record definition (as would be created by an
unnamed ``def`` inheriting from the given class with the given template
arguments) and the value is the value of that record definition.
.. productionlist::
SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"]
A list initializer. The optional :token:`Type` can be used to indicate a
specific element type, otherwise the element type will be deduced from the
given values.
.. The initial `DagArg` of the dag must start with an identifier or
!cast, but this is more of an implementation detail and so for now just
leave it out.
.. productionlist::
SimpleValue: "(" `DagArg` `DagArgList` ")"
DagArgList: `DagArg` ("," `DagArg`)*
DagArg: `Value` [":" `TokVarName`] | `TokVarName`
The initial :token:`DagArg` is called the "operator" of the dag.
.. productionlist::
SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")"
Bodies
------
.. productionlist::
ObjectBody: `BaseClassList` `Body`
BaseClassList: [":" `BaseClassListNE`]
BaseClassListNE: `SubClassRef` ("," `SubClassRef`)*
SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"]
DefmID: `TokIdentifier`
The version with the :token:`MultiClassID` is only valid in the
:token:`BaseClassList` of a ``defm``.
The :token:`MultiClassID` should be the name of a ``multiclass``.
.. put this somewhere else
It is after parsing the base class list that the "let stack" is applied.
.. productionlist::
Body: ";" | "{" BodyList "}"
BodyList: BodyItem*
BodyItem: `Declaration` ";"
:| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";"
The ``let`` form allows overriding the value of an inherited field.
``def``
-------
.. TODO::
There can be pastes in the names here, like ``#NAME#``. Look into that
and document it (it boils down to ParseIDValue with IDParseMode ==
ParseNameMode). ParseObjectName calls into the general ParseValue, with
the only different from "arbitrary expression parsing" being IDParseMode
== Mode.
.. productionlist::
Def: "def" `TokIdentifier` `ObjectBody`
Defines a record whose name is given by the :token:`TokIdentifier`. The
fields of the record are inherited from the base classes and defined in the
body.
Special handling occurs if this ``def`` appears inside a ``multiclass`` or
a ``foreach``.
``defm``
--------
.. productionlist::
Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";"
Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must
precede any ``class``'s that appear.
``foreach``
-----------
.. productionlist::
Foreach: "foreach" `Declaration` "in" "{" `Object`* "}"
:| "foreach" `Declaration` "in" `Object`
The value assigned to the variable in the declaration is iterated over and
the object or object list is reevaluated with the variable set at each
iterated value.
Top-Level ``let``
-----------------
.. productionlist::
Let: "let" `LetList` "in" "{" `Object`* "}"
:| "let" `LetList` "in" `Object`
LetList: `LetItem` ("," `LetItem`)*
LetItem: `TokIdentifier` [`RangeList`] "=" `Value`
This is effectively equivalent to ``let`` inside the body of a record
except that it applies to multiple records at a time. The bindings are
applied at the end of parsing the base classes of a record.
``multiclass``
--------------
.. productionlist::
MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`]
: [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}"
BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)*
MultiClassID: `TokIdentifier`
MultiClassObject: `Def` | `Defm` | `Let` | `Foreach`

View File

@ -1,308 +0,0 @@
========
TableGen
========
.. contents::
:local:
.. toctree::
:hidden:
BackEnds
LangRef
LangIntro
Deficiencies
Introduction
============
TableGen's purpose is to help a human develop and maintain records of
domain-specific information. Because there may be a large number of these
records, it is specifically designed to allow writing flexible descriptions and
for common features of these records to be factored out. This reduces the
amount of duplication in the description, reduces the chance of error, and makes
it easier to structure domain specific information.
The core part of TableGen parses a file, instantiates the declarations, and
hands the result off to a domain-specific `backend`_ for processing.
The current major users of TableGen are :doc:`../CodeGenerator`
and the
`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
Note that if you work on TableGen much, and use emacs or vim, that you can find
an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
``llvm/utils/vim`` directories of your LLVM distribution, respectively.
.. _intro:
The TableGen program
====================
TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
on your build directory under `bin`. It is not installed in the system (or where
your sysroot is set to), since it has no use beyond LLVM's build process.
Running TableGen
----------------
TableGen runs just like any other LLVM tool. The first (optional) argument
specifies the file to read. If a filename is not specified, ``llvm-tblgen``
reads from standard input.
To be useful, one of the `backends`_ must be used. These backends are
selectable on the command line (type '``llvm-tblgen -help``' for a list). For
example, to get a list of all of the definitions that subclass a particular type
(which can be useful for building up an enum list of these records), use the
``-print-enums`` option:
.. code-block:: bash
$ llvm-tblgen X86.td -print-enums -class=Register
AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
XMM6, XMM7, XMM8, XMM9,
$ llvm-tblgen X86.td -print-enums -class=Instruction
ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
The default backend prints out all of the records.
If you plan to use TableGen, you will most likely have to write a `backend`_
that extracts the information specific to what you need and formats it in the
appropriate way.
Example
-------
With no other arguments, `llvm-tblgen` parses the specified file and prints out all
of the classes, then all of the definitions. This is a good way to see what the
various definitions expand to fully. Running this on the ``X86.td`` file prints
this (at the time of this writing):
.. code-block:: text
...
def ADD32rr { // Instruction X86Inst I
string Namespace = "X86";
dag OutOperandList = (outs GR32:$dst);
dag InOperandList = (ins GR32:$src1, GR32:$src2);
string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
list<Register> Uses = [];
list<Register> Defs = [EFLAGS];
list<Predicate> Predicates = [];
int CodeSize = 3;
int AddedComplexity = 0;
bit isReturn = 0;
bit isBranch = 0;
bit isIndirectBranch = 0;
bit isBarrier = 0;
bit isCall = 0;
bit canFoldAsLoad = 0;
bit mayLoad = 0;
bit mayStore = 0;
bit isImplicitDef = 0;
bit isConvertibleToThreeAddress = 1;
bit isCommutable = 1;
bit isTerminator = 0;
bit isReMaterializable = 0;
bit isPredicable = 0;
bit hasDelaySlot = 0;
bit usesCustomInserter = 0;
bit hasCtrlDep = 0;
bit isNotDuplicable = 0;
bit hasSideEffects = 0;
InstrItinClass Itinerary = NoItinerary;
string Constraints = "";
string DisableEncoding = "";
bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
Format Form = MRMDestReg;
bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
ImmType ImmT = NoImm;
bits<3> ImmTypeBits = { 0, 0, 0 };
bit hasOpSizePrefix = 0;
bit hasAdSizePrefix = 0;
bits<4> Prefix = { 0, 0, 0, 0 };
bit hasREX_WPrefix = 0;
FPFormat FPForm = ?;
bits<3> FPFormBits = { 0, 0, 0 };
}
...
This definition corresponds to the 32-bit register-register ``add`` instruction
of the x86 architecture. ``def ADD32rr`` defines a record named
``ADD32rr``, and the comment at the end of the line indicates the superclasses
of the definition. The body of the record contains all of the data that
TableGen assembled for the record, indicating that the instruction is part of
the "X86" namespace, the pattern indicating how the instruction is selected by
the code generator, that it is a two-address instruction, has a particular
encoding, etc. The contents and semantics of the information in the record are
specific to the needs of the X86 backend, and are only shown as an example.
As you can see, a lot of information is needed for every instruction supported
by the code generator, and specifying it all manually would be unmaintainable,
prone to bugs, and tiring to do in the first place. Because we are using
TableGen, all of the information was derived from the following definition:
.. code-block:: text
let Defs = [EFLAGS],
isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y
isConvertibleToThreeAddress = 1 in // Can transform into LEA.
def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst),
(ins GR32:$src1, GR32:$src2),
"add{l}\t{$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
This definition makes use of the custom class ``I`` (extended from the custom
class ``X86Inst``), which is defined in the X86-specific TableGen file, to
factor out the common features that instructions of its class share. A key
feature of TableGen is that it allows the end-user to define the abstractions
they prefer to use when describing their information.
Each ``def`` record has a special entry called "NAME". This is the name of the
record ("``ADD32rr``" above). In the general case ``def`` names can be formed
from various kinds of string processing expressions and ``NAME`` resolves to the
final value obtained after resolving all of those expressions. The user may
refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
``NAME`` should not be defined anywhere else in user code to avoid conflicts.
Syntax
======
TableGen has a syntax that is loosely based on C++ templates, with built-in
types and specification. In addition, TableGen's syntax introduces some
automation concepts like multiclass, foreach, let, etc.
Basic concepts
--------------
TableGen files consist of two key parts: 'classes' and 'definitions', both of
which are considered 'records'.
**TableGen records** have a unique name, a list of values, and a list of
superclasses. The list of values is the main data that TableGen builds for each
record; it is this that holds the domain specific information for the
application. The interpretation of this data is left to a specific `backend`_,
but the structure and format rules are taken care of and are fixed by
TableGen.
**TableGen definitions** are the concrete form of 'records'. These generally do
not have any undefined values, and are marked with the '``def``' keyword.
.. code-block:: text
def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
"Enable ARMv8 FP">;
In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
with some values. The names of the classes are defined via the
keyword `class` either on the same file or some other included. Most target
TableGen files include the generic ones in ``include/llvm/Target``.
**TableGen classes** are abstract records that are used to build and describe
other records. These classes allow the end-user to build abstractions for
either the domain they are targeting (such as "Register", "RegisterClass", and
"Instruction" in the LLVM code generator) or for the implementor to help factor
out common properties of records (such as "FPInst", which is used to represent
floating point instructions in the X86 backend). TableGen keeps track of all of
the classes that are used to build up a definition, so the backend can find all
definitions of a particular class, such as "Instruction".
.. code-block:: text
class ProcNoItin<string Name, list<SubtargetFeature> Features>
: Processor<Name, NoItineraries, Features>;
Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
a list of target features is specializing the class Processor by passing the
arguments down as well as hard-coding NoItineraries.
**TableGen multiclasses** are groups of abstract records that are instantiated
all at once. Each instantiation can result in multiple TableGen definitions.
If a multiclass inherits from another multiclass, the definitions in the
sub-multiclass become part of the current multiclass, as if they were declared
in the current multiclass.
.. code-block:: text
multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
dag address, ValueType sty> {
def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
(!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
Base, Offset, Extend)>;
def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
(!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
Base, Offset, Extend)>;
}
defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
!foreach(decls.pattern, address,
!subst(SHIFT, imm_eq0, decls.pattern)),
i8>;
See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
information on the usage of the language, and the
:doc:`TableGen Language Reference <LangRef>` for more in-depth description
of the formal language specification.
.. _backend:
.. _backends:
TableGen backends
=================
TableGen files have no real meaning without a back-end. The default operation
of running ``llvm-tblgen`` is to print the information in a textual format, but
that's only useful for debugging of the TableGen files themselves. The power
in TableGen is, however, to interpret the source files into an internal
representation that can be generated into anything you want.
Current usage of TableGen is to create huge include files with tables that you
can either include directly (if the output is in the language you're coding),
or be used in pre-processing via macros surrounding the include of the file.
Direct output can be used if the back-end already prints a table in C format
or if the output is just a list of strings (for error and warning messages).
Pre-processed output should be used if the same information needs to be used
in different contexts (like Instruction names), so your back-end should print
a meta-information list that can be shaped into different compile-time formats.
See the `TableGen BackEnds <BackEnds.html>`_ for more information.
TableGen Deficiencies
=====================
Despite being very generic, TableGen has some deficiencies that have been
pointed out numerous times. The common theme is that, while TableGen allows
you to build Domain-Specific-Languages, the final languages that you create
lack the power of other DSLs, which in turn increase considerably the size
and complexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning of
the basic concepts via custom-made back-ends, which can pervert the original
design and make it very hard for newcomers to understand the evil TableGen
file.
There are some in favour of extending the semantics even more, but making sure
back-ends adhere to strict rules. Others are suggesting we should move to less,
more powerful DSLs designed with specific purposes, or even re-using existing
DSLs.
Either way, this is a discussion that will likely span across several years,
if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
document.