Imported Upstream version 5.18.0.167

Former-commit-id: 289509151e0fee68a1b591a20c9f109c3c789d3a
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2018-10-20 08:25:10 +00:00
parent e19d552987
commit b084638f15
28489 changed files with 184 additions and 3866856 deletions

View File

@ -1 +0,0 @@
439089348fffb8add0314fe259e5608250a49415

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

View File

@ -1,174 +0,0 @@
=============================
Advanced Build Configurations
=============================
.. contents::
:local:
Introduction
============
`CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake
does not build the project, it generates the files needed by your build tool
(GNU make, Visual Studio, etc.) for building LLVM.
If **you are a new contributor**, please start with the :doc:`GettingStarted` or
:doc:`CMake` pages. This page is intended for users doing more complex builds.
Many of the examples below are written assuming specific CMake Generators.
Unless otherwise explicitly called out these commands should work with any CMake
generator.
Bootstrap Builds
================
The Clang CMake build system supports bootstrap (aka multi-stage) builds. At a
high level a multi-stage build is a chain of builds that pass data from one
stage into the next. The most common and simple version of this is a traditional
bootstrap build.
In a simple two-stage bootstrap build, we build clang using the system compiler,
then use that just-built clang to build clang again. In CMake this simplest form
of a bootstrap build can be configured with a single option,
CLANG_ENABLE_BOOTSTRAP.
.. code-block:: console
$ cmake -G Ninja -DCLANG_ENABLE_BOOTSTRAP=On <path to source>
$ ninja stage2
This command itself isn't terribly useful because it assumes default
configurations for each stage. The next series of examples utilize CMake cache
scripts to provide more complex options.
The clang build system refers to builds as stages. A stage1 build is a standard
build using the compiler installed on the host, and a stage2 build is built
using the stage1 compiler. This nomenclature holds up to more stages too. In
general a stage*n* build is built using the output from stage*n-1*.
Apple Clang Builds (A More Complex Bootstrap)
=============================================
Apple's Clang builds are a slightly more complicated example of the simple
bootstrapping scenario. Apple Clang is built using a 2-stage build.
The stage1 compiler is a host-only compiler with some options set. The stage1
compiler is a balance of optimization vs build time because it is a throwaway.
The stage2 compiler is the fully optimized compiler intended to ship to users.
Setting up these compilers requires a lot of options. To simplify the
configuration the Apple Clang build settings are contained in CMake Cache files.
You can build an Apple Clang compiler using the following commands:
.. code-block:: console
$ cmake -G Ninja -C <path to clang>/cmake/caches/Apple-stage1.cmake <path to source>
$ ninja stage2-distribution
This CMake invocation configures the stage1 host compiler, and sets
CLANG_BOOTSTRAP_CMAKE_ARGS to pass the Apple-stage2.cmake cache script to the
stage2 configuration step.
When you build the stage2-distribution target it builds the minimal stage1
compiler and required tools, then configures and builds the stage2 compiler
based on the settings in Apple-stage2.cmake.
This pattern of using cache scripts to set complex settings, and specifically to
make later stage builds include cache scripts is common in our more advanced
build configurations.
Multi-stage PGO
===============
Profile-Guided Optimizations (PGO) is a really great way to optimize the code
clang generates. Our multi-stage PGO builds are a workflow for generating PGO
profiles that can be used to optimize clang.
At a high level, the way PGO works is that you build an instrumented compiler,
then you run the instrumented compiler against sample source files. While the
instrumented compiler runs it will output a bunch of files containing
performance counters (.profraw files). After generating all the profraw files
you use llvm-profdata to merge the files into a single profdata file that you
can feed into the LLVM_PROFDATA_FILE option.
Our PGO.cmake cache script automates that whole process. You can use it by
running:
.. code-block:: console
$ cmake -G Ninja -C <path_to_clang>/cmake/caches/PGO.cmake <source dir>
$ ninja stage2-instrumented-generate-profdata
If you let that run for a few hours or so, it will place a profdata file in your
build directory. This takes a really long time because it builds clang twice,
and you *must* have compiler-rt in your build tree.
This process uses any source files under the perf-training directory as training
data as long as the source files are marked up with LIT-style RUN lines.
After it finishes you can use “find . -name clang.profdata” to find it, but it
should be at a path something like:
.. code-block:: console
<build dir>/tools/clang/stage2-instrumented-bins/utils/perf-training/clang.profdata
You can feed that file into the LLVM_PROFDATA_FILE option when you build your
optimized compiler.
The PGO came cache has a slightly different stage naming scheme than other
multi-stage builds. It generates three stages; stage1, stage2-instrumented, and
stage2. Both of the stage2 builds are built using the stage1 compiler.
The PGO came cache generates the following additional targets:
**stage2-instrumented**
Builds a stage1 x86 compiler, runtime, and required tools (llvm-config,
llvm-profdata) then uses that compiler to build an instrumented stage2 compiler.
**stage2-instrumented-generate-profdata**
Depends on "stage2-instrumented" and will use the instrumented compiler to
generate profdata based on the training files in <clang>/utils/perf-training
**stage2**
Depends of "stage2-instrumented-generate-profdata" and will use the stage1
compiler with the stage2 profdata to build a PGO-optimized compiler.
**stage2-check-llvm**
Depends on stage2 and runs check-llvm using the stage2 compiler.
**stage2-check-clang**
Depends on stage2 and runs check-clang using the stage2 compiler.
**stage2-check-all**
Depends on stage2 and runs check-all using the stage2 compiler.
**stage2-test-suite**
Depends on stage2 and runs the test-suite using the stage3 compiler (requires
in-tree test-suite).
3-Stage Non-Determinism
=======================
In the ancient lore of compilers non-determinism is like the multi-headed hydra.
Whenever it's head pops up, terror and chaos ensue.
Historically one of the tests to verify that a compiler was deterministic would
be a three stage build. The idea of a three stage build is you take your sources
and build a compiler (stage1), then use that compiler to rebuild the sources
(stage2), then you use that compiler to rebuild the sources a third time
(stage3) with an identical configuration to the stage2 build. At the end of
this, you have a stage2 and stage3 compiler that should be bit-for-bit
identical.
You can perform one of these 3-stage builds with LLVM & clang using the
following commands:
.. code-block:: console
$ cmake -G Ninja -C <path_to_clang>/cmake/caches/3-stage.cmake <source dir>
$ ninja stage3
After the build you can compare the stage2 & stage3 compilers. We have a bot
setup `here <http://lab.llvm.org:8011/builders/clang-3stage-ubuntu>`_ that runs
this build and compare configuration.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,87 +0,0 @@
==================================
Benchmarking tips
==================================
Introduction
============
For benchmarking a patch we want to reduce all possible sources of
noise as much as possible. How to do that is very OS dependent.
Note that low noise is required, but not sufficient. It does not
exclude measurement bias. See
https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
example.
General
================================
* Use a high resolution timer, e.g. perf under linux.
* Run the benchmark multiple times to be able to recognize noise.
* Disable as many processes or services as possible on the target system.
* Disable frequency scaling, turbo boost and address space
randomization (see OS specific section).
* Static link if the OS supports it. That avoids any variation that
might be introduced by loading dynamic libraries. This can be done
by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
* Try to avoid storage. On some systems you can use tmpfs. Putting the
program, inputs and outputs on tmpfs avoids touching a real storage
system, which can have a pretty big variability.
To mount it (on linux and freebsd at least)::
mount -t tmpfs -o size=<XX>g none dir_to_mount
Linux
=====
* Disable address space randomization::
echo 0 > /proc/sys/kernel/randomize_va_space
* Set scaling_governor to performance::
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
done
* Use https://github.com/lpechacek/cpuset to reserve cpus for just the
program you are benchmarking. If using perf, leave at least 2 cores
so that perf runs in one and your program in another::
cset shield -c N1,N2 -k on
This will move all threads out of N1 and N2. The ``-k on`` means
that even kernel threads are moved out.
* Disable the SMT pair of the cpus you will use for the benchmark. The
pair of cpu N can be found in
``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
disabled with::
echo 0 > /sys/devices/system/cpu/cpuX/online
* Run the program with::
cset shield --exec -- perf stat -r 10 <cmd>
This will run the command after ``--`` in the isolated cpus. The
particular perf command runs the ``<cmd>`` 10 times and reports
statistics.
With these in place you can expect perf variations of less than 0.1%.
Linux Intel
-----------
* Disable turbo mode::
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

View File

@ -1,205 +0,0 @@
==============================================
Using ARM NEON instructions in big endian mode
==============================================
.. contents::
:local:
Introduction
============
Generating code for big endian ARM processors is for the most part straightforward. NEON loads and stores however have some interesting properties that make code generation decisions less obvious in big endian mode.
The aim of this document is to explain the problem with NEON loads and stores, and the solution that has been implemented in LLVM.
In this document the term "vector" refers to what the ARM ABI calls a "short vector", which is a sequence of items that can fit in a NEON register. This sequence can be 64 or 128 bits in length, and can constitute 8, 16, 32 or 64 bit items. This document refers to A64 instructions throughout, but is almost applicable to the A32/ARMv7 instruction sets also. The ABI format for passing vectors in A32 is sligtly different to A64. Apart from that, the same concepts apply.
Example: C-level intrinsics -> assembly
---------------------------------------
It may be helpful first to illustrate how C-level ARM NEON intrinsics are lowered to instructions.
This trivial C function takes a vector of four ints and sets the zero'th lane to the value "42"::
#include <arm_neon.h>
int32x4_t f(int32x4_t p) {
return vsetq_lane_s32(42, p, 0);
}
arm_neon.h intrinsics generate "generic" IR where possible (that is, normal IR instructions not ``llvm.arm.neon.*`` intrinsic calls). The above generates::
define <4 x i32> @f(<4 x i32> %p) {
%vset_lane = insertelement <4 x i32> %p, i32 42, i32 0
ret <4 x i32> %vset_lane
}
Which then becomes the following trivial assembly::
f: // @f
movz w8, #0x2a
ins v0.s[0], w8
ret
Problem
=======
The main problem is how vectors are represented in memory and in registers.
First, a recap. The "endianness" of an item affects its representation in memory only. In a register, a number is just a sequence of bits - 64 bits in the case of AArch64 general purpose registers. Memory, however, is a sequence of addressable units of 8 bits in size. Any number greater than 8 bits must therefore be split up into 8-bit chunks, and endianness describes the order in which these chunks are laid out in memory.
A "little endian" layout has the least significant byte first (lowest in memory address). A "big endian" layout has the *most* significant byte first. This means that when loading an item from big endian memory, the lowest 8-bits in memory must go in the most significant 8-bits, and so forth.
``LDR`` and ``LD1``
===================
.. figure:: ARM-BE-ldr.png
:align: right
Big endian vector load using ``LDR``.
A vector is a consecutive sequence of items that are operated on simultaneously. To load a 64-bit vector, 64 bits need to be read from memory. In little endian mode, we can do this by just performing a 64-bit load - ``LDR q0, [foo]``. However if we try this in big endian mode, because of the byte swapping the lane indices end up being swapped! The zero'th item as laid out in memory becomes the n'th lane in the vector.
.. figure:: ARM-BE-ld1.png
:align: right
Big endian vector load using ``LD1``. Note that the lanes retain the correct ordering.
Because of this, the instruction ``LD1`` performs a vector load but performs byte swapping not on the entire 64 bits, but on the individual items within the vector. This means that the register content is the same as it would have been on a little endian system.
It may seem that ``LD1`` should suffice to peform vector loads on a big endian machine. However there are pros and cons to the two approaches that make it less than simple which register format to pick.
There are two options:
1. The content of a vector register is the same *as if* it had been loaded with an ``LDR`` instruction.
2. The content of a vector register is the same *as if* it had been loaded with an ``LD1`` instruction.
Because ``LD1 == LDR + REV`` and similarly ``LDR == LD1 + REV`` (on a big endian system), we can simulate either type of load with the other type of load plus a ``REV`` instruction. So we're not deciding which instructions to use, but which format to use (which will then influence which instruction is best to use).
.. The 'clearer' container is required to make the following section header come after the floated
images above.
.. container:: clearer
Note that throughout this section we only mention loads. Stores have exactly the same problems as their associated loads, so have been skipped for brevity.
Considerations
==============
LLVM IR Lane ordering
---------------------
LLVM IR has first class vector types. In LLVM IR, the zero'th element of a vector resides at the lowest memory address. The optimizer relies on this property in certain areas, for example when concatenating vectors together. The intention is for arrays and vectors to have identical memory layouts - ``[4 x i8]`` and ``<4 x i8>`` should be represented the same in memory. Without this property there would be many special cases that the optimizer would have to cleverly handle.
Use of ``LDR`` would break this lane ordering property. This doesn't preclude the use of ``LDR``, but we would have to do one of two things:
1. Insert a ``REV`` instruction to reverse the lane order after every ``LDR``.
2. Disable all optimizations that rely on lane layout, and for every access to an individual lane (``insertelement``/``extractelement``/``shufflevector``) reverse the lane index.
AAPCS
-----
The ARM procedure call standard (AAPCS) defines the ABI for passing vectors between functions in registers. It states:
When a short vector is transferred between registers and memory it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single ``STR`` of the entire register; a short vector is loaded from memory using the corresponding ``LDR`` instruction. On a little-endian system this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector.
-- Procedure Call Standard for the ARM 64-bit Architecture (AArch64), 4.1.2 Short Vectors
The use of ``LDR`` and ``STR`` as the ABI defines has at least one advantage over ``LD1`` and ``ST1``. ``LDR`` and ``STR`` are oblivious to the size of the individual lanes of a vector. ``LD1`` and ``ST1`` are not - the lane size is encoded within them. This is important across an ABI boundary, because it would become necessary to know the lane width the callee expects. Consider the following code:
.. code-block:: c
<callee.c>
void callee(uint32x2_t v) {
...
}
<caller.c>
extern void callee(uint32x2_t);
void caller() {
callee(...);
}
If ``callee`` changed its signature to ``uint16x4_t``, which is equivalent in register content, if we passed as ``LD1`` we'd break this code until ``caller`` was updated and recompiled.
There is an argument that if the signatures of the two functions are different then the behaviour should be undefined. But there may be functions that are agnostic to the lane layout of the vector, and treating the vector as an opaque value (just loading it and storing it) would be impossible without a common format across ABI boundaries.
So to preserve ABI compatibility, we need to use the ``LDR`` lane layout across function calls.
Alignment
---------
In strict alignment mode, ``LDR qX`` requires its address to be 128-bit aligned, whereas ``LD1`` only requires it to be as aligned as the lane size. If we canonicalised on using ``LDR``, we'd still need to use ``LD1`` in some places to avoid alignment faults (the result of the ``LD1`` would then need to be reversed with ``REV``).
Most operating systems however do not run with alignment faults enabled, so this is often not an issue.
Summary
-------
The following table summarises the instructions that are required to be emitted for each property mentioned above for each of the two solutions.
+-------------------------------+-------------------------------+---------------------+
| | ``LDR`` layout | ``LD1`` layout |
+===============================+===============================+=====================+
| Lane ordering | ``LDR + REV`` | ``LD1`` |
+-------------------------------+-------------------------------+---------------------+
| AAPCS | ``LDR`` | ``LD1 + REV`` |
+-------------------------------+-------------------------------+---------------------+
| Alignment for strict mode | ``LDR`` / ``LD1 + REV`` | ``LD1`` |
+-------------------------------+-------------------------------+---------------------+
Neither approach is perfect, and choosing one boils down to choosing the lesser of two evils. The issue with lane ordering, it was decided, would have to change target-agnostic compiler passes and would result in a strange IR in which lane indices were reversed. It was decided that this was worse than the changes that would have to be made to support ``LD1``, so ``LD1`` was chosen as the canonical vector load instruction (and by inference, ``ST1`` for vector stores).
Implementation
==============
There are 3 parts to the implementation:
1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_ - these by definition cannot have lane ordering problems so are fine to use ``LDR``/``STR``.
2. Create code generation patterns for bitconverts that create ``REV`` instructions.
3. Make sure appropriate bitconverts are created so that vector values get passed over call boundaries as 1-element vectors (which is the same as if they were loaded with ``LDR``).
Bitconverts
-----------
.. image:: ARM-BE-bitcastfail.png
:align: right
The main problem with the ``LD1`` solution is dealing with bitconverts (or bitcasts, or reinterpret casts). These are pseudo instructions that only change the compiler's interpretation of data, not the underlying data itself. A requirement is that if data is loaded and then saved again (called a "round trip"), the memory contents should be the same after the store as before the load. If a vector is loaded and is then bitconverted to a different vector type before storing, the round trip will currently be broken.
Take for example this code sequence::
%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
store <2 x i64> %1, <2 x i64>* %y
This would produce a code sequence such as that in the figure on the right. The mismatched ``LD1`` and ``ST1`` cause the stored data to differ from the loaded data.
.. container:: clearer
When we see a bitcast from type ``X`` to type ``Y``, what we need to do is to change the in-register representation of the data to be *as if* it had just been loaded by a ``LD1`` of type ``Y``.
.. image:: ARM-BE-bitcastsuccess.png
:align: right
Conceptually this is simple - we can insert a ``REV`` undoing the ``LD1`` of type ``X`` (converting the in-register representation to the same as if it had been loaded by ``LDR``) and then insert another ``REV`` to change the representation to be as if it had been loaded by an ``LD1`` of type ``Y``.
For the previous example, this would be::
LD1 v0.4s, [x]
REV64 v0.4s, v0.4s // There is no REV128 instruction, so it must be synthesizedcd
EXT v0.16b, v0.16b, v0.16b, #8 // with a REV64 then an EXT to swap the two 64-bit elements.
REV64 v0.2d, v0.2d
EXT v0.16b, v0.16b, v0.16b, #8
ST1 v0.2d, [y]
It turns out that these ``REV`` pairs can, in almost all cases, be squashed together into a single ``REV``. For the example above, a ``REV128 4s`` + ``REV128 2d`` is actually a ``REV64 4s``, as shown in the figure on the right.
.. [1] One lane vectors may seem useless as a concept but they serve to distinguish between values held in general purpose registers and values held in NEON/VFP registers. For example, an ``i64`` would live in an ``x`` register, but ``<1 x i64>`` would live in a ``d`` register.

File diff suppressed because it is too large Load Diff

View File

@ -1,130 +0,0 @@
================================
LLVM Block Frequency Terminology
================================
.. contents::
:local:
Introduction
============
Block Frequency is a metric for estimating the relative frequency of different
basic blocks. This document describes the terminology that the
``BlockFrequencyInfo`` and ``MachineBlockFrequencyInfo`` analysis passes use.
Branch Probability
==================
Blocks with multiple successors have probabilities associated with each
outgoing edge. These are called branch probabilities. For a given block, the
sum of its outgoing branch probabilities should be 1.0.
Branch Weight
=============
Rather than storing fractions on each edge, we store an integer weight.
Weights are relative to the other edges of a given predecessor block. The
branch probability associated with a given edge is its own weight divided by
the sum of the weights on the predecessor's outgoing edges.
For example, consider this IR:
.. code-block:: llvm
define void @foo() {
; ...
A:
br i1 %cond, label %B, label %C, !prof !0
; ...
}
!0 = metadata !{metadata !"branch_weights", i32 7, i32 8}
and this simple graph representation::
A -> B (edge-weight: 7)
A -> C (edge-weight: 8)
The probability of branching from block A to block B is 7/15, and the
probability of branching from block A to block C is 8/15.
See :doc:`BranchWeightMetadata` for details about the branch weight IR
representation.
Block Frequency
===============
Block frequency is a relative metric that represents the number of times a
block executes. The ratio of a block frequency to the entry block frequency is
the expected number of times the block will execute per entry to the function.
Block frequency is the main output of the ``BlockFrequencyInfo`` and
``MachineBlockFrequencyInfo`` analysis passes.
Implementation: a series of DAGs
================================
The implementation of the block frequency calculation analyses each loop,
bottom-up, ignoring backedges; i.e., as a DAG. After each loop is processed,
it's packaged up to act as a pseudo-node in its parent loop's (or the
function's) DAG analysis.
Block Mass
==========
For each DAG, the entry node is assigned a mass of ``UINT64_MAX`` and mass is
distributed to successors according to branch weights. Block Mass uses a
fixed-point representation where ``UINT64_MAX`` represents ``1.0`` and ``0``
represents a number just above ``0.0``.
After mass is fully distributed, in any cut of the DAG that separates the exit
nodes from the entry node, the sum of the block masses of the nodes succeeded
by a cut edge should equal ``UINT64_MAX``. In other words, mass is conserved
as it "falls" through the DAG.
If a function's basic block graph is a DAG, then block masses are valid block
frequencies. This works poorly in practise though, since downstream users rely
on adding block frequencies together without hitting the maximum.
Loop Scale
==========
Loop scale is a metric that indicates how many times a loop iterates per entry.
As mass is distributed through the loop's DAG, the (otherwise ignored) backedge
mass is collected. This backedge mass is used to compute the exit frequency,
and thus the loop scale.
Implementation: Getting from mass and scale to frequency
========================================================
After analysing the complete series of DAGs, each block has a mass (local to
its containing loop, if any), and each loop pseudo-node has a loop scale and
its own mass (from its parent's DAG).
We can get an initial frequency assignment (with entry frequency of 1.0) by
multiplying these masses and loop scales together. A given block's frequency
is the product of its mass, the mass of containing loops' pseudo nodes, and the
containing loops' loop scales.
Since downstream users need integers (not floating point), this initial
frequency assignment is shifted as necessary into the range of ``uint64_t``.
Block Bias
==========
Block bias is a proposed *absolute* metric to indicate a bias toward or away
from a given block during a function's execution. The idea is that bias can be
used in isolation to indicate whether a block is relatively hot or cold, or to
compare two blocks to indicate whether one is hotter or colder than the other.
The proposed calculation involves calculating a *reference* block frequency,
where:
* every branch weight is assumed to be 1 (i.e., every branch probability
distribution is even) and
* loop scales are ignored.
This reference frequency represents what the block frequency would be in an
unbiased graph.
The bias is the ratio of the block frequency to this reference block frequency.

View File

@ -1,164 +0,0 @@
===========================
LLVM Branch Weight Metadata
===========================
.. contents::
:local:
Introduction
============
Branch Weight Metadata represents branch weights as its likeliness to be taken
(see :doc:`BlockFrequencyTerminology`). Metadata is assigned to the
``TerminatorInst`` as a ``MDNode`` of the ``MD_prof`` kind. The first operator
is always a ``MDString`` node with the string "branch_weights". Number of
operators depends on the terminator type.
Branch weights might be fetch from the profiling file, or generated based on
`__builtin_expect`_ instruction.
All weights are represented as an unsigned 32-bit values, where higher value
indicates greater chance to be taken.
Supported Instructions
======================
``BranchInst``
^^^^^^^^^^^^^^
Metadata is only assigned to the conditional branches. There are two extra
operands for the true and the false branch.
.. code-block:: none
!0 = metadata !{
metadata !"branch_weights",
i32 <TRUE_BRANCH_WEIGHT>,
i32 <FALSE_BRANCH_WEIGHT>
}
``SwitchInst``
^^^^^^^^^^^^^^
Branch weights are assigned to every case (including the ``default`` case which
is always case #0).
.. code-block:: none
!0 = metadata !{
metadata !"branch_weights",
i32 <DEFAULT_BRANCH_WEIGHT>
[ , i32 <CASE_BRANCH_WEIGHT> ... ]
}
``IndirectBrInst``
^^^^^^^^^^^^^^^^^^
Branch weights are assigned to every destination.
.. code-block:: none
!0 = metadata !{
metadata !"branch_weights",
i32 <LABEL_BRANCH_WEIGHT>
[ , i32 <LABEL_BRANCH_WEIGHT> ... ]
}
``CallInst``
^^^^^^^^^^^^^^^^^^
Calls may have branch weight metadata, containing the execution count of
the call. It is currently used in SamplePGO mode only, to augment the
block and entry counts which may not be accurate with sampling.
.. code-block:: none
!0 = metadata !{
metadata !"branch_weights",
i32 <CALL_BRANCH_WEIGHT>
}
Other
^^^^^
Other terminator instructions are not allowed to contain Branch Weight Metadata.
.. _\__builtin_expect:
Built-in ``expect`` Instructions
================================
``__builtin_expect(long exp, long c)`` instruction provides branch prediction
information. The return value is the value of ``exp``.
It is especially useful in conditional statements. Currently Clang supports two
conditional statements:
``if`` statement
^^^^^^^^^^^^^^^^
The ``exp`` parameter is the condition. The ``c`` parameter is the expected
comparison value. If it is equal to 1 (true), the condition is likely to be
true, in other case condition is likely to be false. For example:
.. code-block:: c++
if (__builtin_expect(x > 0, 1)) {
// This block is likely to be taken.
}
``switch`` statement
^^^^^^^^^^^^^^^^^^^^
The ``exp`` parameter is the value. The ``c`` parameter is the expected
value. If the expected value doesn't show on the cases list, the ``default``
case is assumed to be likely taken.
.. code-block:: c++
switch (__builtin_expect(x, 5)) {
default: break;
case 0: // ...
case 3: // ...
case 5: // This case is likely to be taken.
}
CFG Modifications
=================
Branch Weight Metatada is not proof against CFG changes. If terminator operands'
are changed some action should be taken. In other case some misoptimizations may
occur due to incorrect branch prediction information.
Function Entry Counts
=====================
To allow comparing different functions during inter-procedural analysis and
optimization, ``MD_prof`` nodes can also be assigned to a function definition.
The first operand is a string indicating the name of the associated counter.
Currently, one counter is supported: "function_entry_count". The second operand
is a 64-bit counter that indicates the number of times that this function was
invoked (in the case of instrumentation-based profiles). In the case of
sampling-based profiles, this operand is an approximation of how many times
the function was invoked.
For example, in the code below, the instrumentation for function foo()
indicates that it was called 2,590 times at runtime.
.. code-block:: llvm
define i32 @foo() !prof !1 {
ret i32 0
}
!1 = !{!"function_entry_count", i64 2590}
If "function_entry_count" has more than 2 operands, the later operands are
the GUID of the functions that needs to be imported by ThinLTO. This is only
set by sampling based profile. It is needed because the sampling based profile
was collected on a binary that had already imported and inlined these functions,
and we need to ensure the IR matches in the ThinLTO backends for profile
annotation. The reason why we cannot annotate this on the callsite is that it
can only goes down 1 level in the call chain. For the cases where
foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
in the call chain to import both bar_in_b_cc and baz_in_c_cc.

View File

@ -1,221 +0,0 @@
====================================
LLVM bugpoint tool: design and usage
====================================
.. contents::
:local:
Description
===========
``bugpoint`` narrows down the source of problems in LLVM tools and passes. It
can be used to debug three types of failures: optimizer crashes, miscompilations
by optimizers, or bad native code generation (including problems in the static
and JIT compilers). It aims to reduce large test cases to small, useful ones.
For example, if ``opt`` crashes while optimizing a file, it will identify the
optimization (or combination of optimizations) that causes the crash, and reduce
the file down to a small example which triggers the crash.
For detailed case scenarios, such as debugging ``opt``, or one of the LLVM code
generators, see :doc:`HowToSubmitABug`.
Design Philosophy
=================
``bugpoint`` is designed to be a useful tool without requiring any hooks into
the LLVM infrastructure at all. It works with any and all LLVM passes and code
generators, and does not need to "know" how they work. Because of this, it may
appear to do stupid things or miss obvious simplifications. ``bugpoint`` is
also designed to trade off programmer time for computer time in the
compiler-debugging process; consequently, it may take a long period of
(unattended) time to reduce a test case, but we feel it is still worth it. Note
that ``bugpoint`` is generally very quick unless debugging a miscompilation
where each test of the program (which requires executing it) takes a long time.
Automatic Debugger Selection
----------------------------
``bugpoint`` reads each ``.bc`` or ``.ll`` file specified on the command line
and links them together into a single module, called the test program. If any
LLVM passes are specified on the command line, it runs these passes on the test
program. If any of the passes crash, or if they produce malformed output (which
causes the verifier to abort), ``bugpoint`` starts the `crash debugger`_.
Otherwise, if the ``-output`` option was not specified, ``bugpoint`` runs the
test program with the "safe" backend (which is assumed to generate good code) to
generate a reference output. Once ``bugpoint`` has a reference output for the
test program, it tries executing it with the selected code generator. If the
selected code generator crashes, ``bugpoint`` starts the `crash debugger`_ on
the code generator. Otherwise, if the resulting output differs from the
reference output, it assumes the difference resulted from a code generator
failure, and starts the `code generator debugger`_.
Finally, if the output of the selected code generator matches the reference
output, ``bugpoint`` runs the test program after all of the LLVM passes have
been applied to it. If its output differs from the reference output, it assumes
the difference resulted from a failure in one of the LLVM passes, and enters the
`miscompilation debugger`_. Otherwise, there is no problem ``bugpoint`` can
debug.
.. _crash debugger:
Crash debugger
--------------
If an optimizer or code generator crashes, ``bugpoint`` will try as hard as it
can to reduce the list of passes (for optimizer crashes) and the size of the
test program. First, ``bugpoint`` figures out which combination of optimizer
passes triggers the bug. This is useful when debugging a problem exposed by
``opt``, for example, because it runs over 38 passes.
Next, ``bugpoint`` tries removing functions from the test program, to reduce its
size. Usually it is able to reduce a test program to a single function, when
debugging intraprocedural optimizations. Once the number of functions has been
reduced, it attempts to delete various edges in the control flow graph, to
reduce the size of the function as much as possible. Finally, ``bugpoint``
deletes any individual LLVM instructions whose absence does not eliminate the
failure. At the end, ``bugpoint`` should tell you what passes crash, give you a
bitcode file, and give you instructions on how to reproduce the failure with
``opt`` or ``llc``.
.. _code generator debugger:
Code generator debugger
-----------------------
The code generator debugger attempts to narrow down the amount of code that is
being miscompiled by the selected code generator. To do this, it takes the test
program and partitions it into two pieces: one piece which it compiles with the
"safe" backend (into a shared object), and one piece which it runs with either
the JIT or the static LLC compiler. It uses several techniques to reduce the
amount of code pushed through the LLVM code generator, to reduce the potential
scope of the problem. After it is finished, it emits two bitcode files (called
"test" [to be compiled with the code generator] and "safe" [to be compiled with
the "safe" backend], respectively), and instructions for reproducing the
problem. The code generator debugger assumes that the "safe" backend produces
good code.
.. _miscompilation debugger:
Miscompilation debugger
-----------------------
The miscompilation debugger works similarly to the code generator debugger. It
works by splitting the test program into two pieces, running the optimizations
specified on one piece, linking the two pieces back together, and then executing
the result. It attempts to narrow down the list of passes to the one (or few)
which are causing the miscompilation, then reduce the portion of the test
program which is being miscompiled. The miscompilation debugger assumes that
the selected code generator is working properly.
Advice for using bugpoint
=========================
``bugpoint`` can be a remarkably useful tool, but it sometimes works in
non-obvious ways. Here are some hints and tips:
* In the code generator and miscompilation debuggers, ``bugpoint`` only works
with programs that have deterministic output. Thus, if the program outputs
``argv[0]``, the date, time, or any other "random" data, ``bugpoint`` may
misinterpret differences in these data, when output, as the result of a
miscompilation. Programs should be temporarily modified to disable outputs
that are likely to vary from run to run.
* In the code generator and miscompilation debuggers, debugging will go faster
if you manually modify the program or its inputs to reduce the runtime, but
still exhibit the problem.
* ``bugpoint`` is extremely useful when working on a new optimization: it helps
track down regressions quickly. To avoid having to relink ``bugpoint`` every
time you change your optimization however, have ``bugpoint`` dynamically load
your optimization with the ``-load`` option.
* ``bugpoint`` can generate a lot of output and run for a long period of time.
It is often useful to capture the output of the program to file. For example,
in the C shell, you can run:
.. code-block:: console
$ bugpoint ... |& tee bugpoint.log
to get a copy of ``bugpoint``'s output in the file ``bugpoint.log``, as well
as on your terminal.
* ``bugpoint`` cannot debug problems with the LLVM linker. If ``bugpoint``
crashes before you see its "All input ok" message, you might try ``llvm-link
-v`` on the same set of input files. If that also crashes, you may be
experiencing a linker bug.
* ``bugpoint`` is useful for proactively finding bugs in LLVM. Invoking
``bugpoint`` with the ``-find-bugs`` option will cause the list of specified
optimizations to be randomized and applied to the program. This process will
repeat until a bug is found or the user kills ``bugpoint``.
* ``bugpoint`` can produce IR which contains long names. Run ``opt
-metarenamer`` over the IR to rename everything using easy-to-read,
metasyntactic names. Alternatively, run ``opt -strip -instnamer`` to rename
everything with very short (often purely numeric) names.
What to do when bugpoint isn't enough
=====================================
Sometimes, ``bugpoint`` is not enough. In particular, InstCombine and
TargetLowering both have visitor structured code with lots of potential
transformations. If the process of using bugpoint has left you with still too
much code to figure out and the problem seems to be in instcombine, the
following steps may help. These same techniques are useful with TargetLowering
as well.
Turn on ``-debug-only=instcombine`` and see which transformations within
instcombine are firing by selecting out lines with "``IC``" in them.
At this point, you have a decision to make. Is the number of transformations
small enough to step through them using a debugger? If so, then try that.
If there are too many transformations, then a source modification approach may
be helpful. In this approach, you can modify the source code of instcombine to
disable just those transformations that are being performed on your test input
and perform a binary search over the set of transformations. One set of places
to modify are the "``visit*``" methods of ``InstCombiner`` (*e.g.*
``visitICmpInst``) by adding a "``return false``" as the first line of the
method.
If that still doesn't remove enough, then change the caller of
``InstCombiner::DoOneIteration``, ``InstCombiner::runOnFunction`` to limit the
number of iterations.
You may also find it useful to use "``-stats``" now to see what parts of
instcombine are firing. This can guide where to put additional reporting code.
At this point, if the amount of transformations is still too large, then
inserting code to limit whether or not to execute the body of the code in the
visit function can be helpful. Add a static counter which is incremented on
every invocation of the function. Then add code which simply returns false on
desired ranges. For example:
.. code-block:: c++
static int calledCount = 0;
calledCount++;
DEBUG(if (calledCount < 212) return false);
DEBUG(if (calledCount > 217) return false);
DEBUG(if (calledCount == 213) return false);
DEBUG(if (calledCount == 214) return false);
DEBUG(if (calledCount == 215) return false);
DEBUG(if (calledCount == 216) return false);
DEBUG(dbgs() << "visitXOR calledCount: " << calledCount << "\n");
DEBUG(dbgs() << "I: "; I->dump());
could be added to ``visitXOR`` to limit ``visitXor`` to being applied only to
calls 212 and 217. This is from an actual test case and raises an important
point---a simple binary search may not be sufficient, as transformations that
interact may require isolating more than one call. In TargetLowering, use
``return SDNode();`` instead of ``return false;``.
Now that the number of transformations is down to a manageable number, try
examining the output to see if you can figure out which transformations are
being done. If that can be figured out, then do the usual debugging. If which
code corresponds to the transformation being performed isn't obvious, set a
breakpoint after the call count based disabling and step through the code.
Alternatively, you can use "``printf``" style debugging to report waypoints.

View File

@ -1,91 +0,0 @@
==============================================
Control Flow Verification Tool Design Document
==============================================
.. contents::
:local:
Objective
=========
This document provides an overview of an external tool to verify the protection
mechanisms implemented by Clang's *Control Flow Integrity* (CFI) schemes
(``-fsanitize=cfi``). This tool, provided a binary or DSO, should infer whether
indirect control flow operations are protected by CFI, and should output these
results in a human-readable form.
This tool should also be added as part of Clang's continuous integration testing
framework, where modifications to the compiler ensure that CFI protection
schemes are still present in the final binary.
Location
========
This tool will be present as a part of the LLVM toolchain, and will reside in
the "/llvm/tools/llvm-cfi-verify" directory, relative to the LLVM trunk. It will
be tested in two methods:
- Unit tests to validate code sections, present in "/llvm/unittests/llvm-cfi-
verify".
- Integration tests, present in "/llvm/tools/clang/test/LLVMCFIVerify". These
integration tests are part of clang as part of a continuous integration
framework, ensuring updates to the compiler that reduce CFI coverage on
indirect control flow instructions are identified.
Background
==========
This tool will continuously validate that CFI directives are properly
implemented around all indirect control flows by analysing the output machine
code. The analysis of machine code is important as it ensures that any bugs
present in linker or compiler do not subvert CFI protections in the final
shipped binary.
Unprotected indirect control flow instructions will be flagged for manual
review. These unexpected control flows may simply have not been accounted for in
the compiler implementation of CFI (e.g. indirect jumps to facilitate switch
statements may not be fully protected).
It may be possible in the future to extend this tool to flag unnecessary CFI
directives (e.g. CFI directives around a static call to a non-polymorphic base
type). This type of directive has no security implications, but may present
performance impacts.
Design Ideas
============
This tool will disassemble binaries and DSO's from their machine code format and
analyse the disassembled machine code. The tool will inspect virtual calls and
indirect function calls. This tool will also inspect indirect jumps, as inlined
functions and jump tables should also be subject to CFI protections. Non-virtual
calls (``-fsanitize=cfi-nvcall``) and cast checks (``-fsanitize=cfi-*cast*``)
are not implemented due to a lack of information provided by the bytecode.
The tool would operate by searching for indirect control flow instructions in
the disassembly. A control flow graph would be generated from a small buffer of
the instructions surrounding the 'target' control flow instruction. If the
target instruction is branched-to, the fallthrough of the branch should be the
CFI trap (on x86, this is a ``ud2`` instruction). If the target instruction is
the fallthrough (i.e. immediately succeeds) of a conditional jump, the
conditional jump target should be the CFI trap. If an indirect control flow
instruction does not conform to one of these formats, the target will be noted
as being CFI-unprotected.
Note that in the second case outlined above (where the target instruction is the
fallthrough of a conditional jump), if the target represents a vcall that takes
arguments, these arguments may be pushed to the stack after the branch but
before the target instruction. In these cases, a secondary 'spill graph' in
constructed, to ensure the register argument used by the indirect jump/call is
not spilled from the stack at any point in the interim period. If there are no
spills that affect the target register, the target is marked as CFI-protected.
Other Design Notes
~~~~~~~~~~~~~~~~~~
Only machine code sections that are marked as executable will be subject to this
analysis. Non-executable sections do not require analysis as any execution
present in these sections has already violated the control flow integrity.
Suitable extensions may be made at a later date to include anaylsis for indirect
control flow operations across DSO boundaries. Currently, these CFI features are
only experimental with an unstable ABI, making them unsuitable for analysis.

File diff suppressed because it is too large Load Diff

View File

@ -1,168 +0,0 @@
if (DOXYGEN_FOUND)
if (LLVM_ENABLE_DOXYGEN)
set(abs_top_srcdir ${CMAKE_CURRENT_SOURCE_DIR})
set(abs_top_builddir ${CMAKE_CURRENT_BINARY_DIR})
if (HAVE_DOT)
set(DOT ${LLVM_PATH_DOT})
endif()
if (LLVM_DOXYGEN_EXTERNAL_SEARCH)
set(enable_searchengine "YES")
set(searchengine_url "${LLVM_DOXYGEN_SEARCHENGINE_URL}")
set(enable_server_based_search "YES")
set(enable_external_search "YES")
set(extra_search_mappings "${LLVM_DOXYGEN_SEARCH_MAPPINGS}")
else()
set(enable_searchengine "NO")
set(searchengine_url "")
set(enable_server_based_search "NO")
set(enable_external_search "NO")
set(extra_search_mappings "")
endif()
# If asked, configure doxygen for the creation of a Qt Compressed Help file.
option(LLVM_ENABLE_DOXYGEN_QT_HELP
"Generate a Qt Compressed Help file." OFF)
if (LLVM_ENABLE_DOXYGEN_QT_HELP)
set(LLVM_DOXYGEN_QCH_FILENAME "org.llvm.qch" CACHE STRING
"Filename of the Qt Compressed help file")
set(LLVM_DOXYGEN_QHP_NAMESPACE "org.llvm" CACHE STRING
"Namespace under which the intermediate Qt Help Project file lives")
set(LLVM_DOXYGEN_QHP_CUST_FILTER_NAME "${PACKAGE_STRING}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters")
set(LLVM_DOXYGEN_QHP_CUST_FILTER_ATTRS "${PACKAGE_NAME},${PACKAGE_VERSION}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes")
find_program(LLVM_DOXYGEN_QHELPGENERATOR_PATH qhelpgenerator
DOC "Path to the qhelpgenerator binary")
if (NOT LLVM_DOXYGEN_QHELPGENERATOR_PATH)
message(FATAL_ERROR "Failed to find qhelpgenerator binary")
endif()
set(llvm_doxygen_generate_qhp "YES")
set(llvm_doxygen_qch_filename "${LLVM_DOXYGEN_QCH_FILENAME}")
set(llvm_doxygen_qhp_namespace "${LLVM_DOXYGEN_QHP_NAMESPACE}")
set(llvm_doxygen_qhelpgenerator_path "${LLVM_DOXYGEN_QHELPGENERATOR_PATH}")
set(llvm_doxygen_qhp_cust_filter_name "${LLVM_DOXYGEN_QHP_CUST_FILTER_NAME}")
set(llvm_doxygen_qhp_cust_filter_attrs "${LLVM_DOXYGEN_QHP_CUST_FILTER_ATTRS}")
else()
set(llvm_doxygen_generate_qhp "NO")
set(llvm_doxygen_qch_filename "")
set(llvm_doxygen_qhp_namespace "")
set(llvm_doxygen_qhelpgenerator_path "")
set(llvm_doxygen_qhp_cust_filter_name "")
set(llvm_doxygen_qhp_cust_filter_attrs "")
endif()
option(LLVM_DOXYGEN_SVG
"Use svg instead of png files for doxygen graphs." OFF)
if (LLVM_DOXYGEN_SVG)
set(DOT_IMAGE_FORMAT "svg")
else()
set(DOT_IMAGE_FORMAT "png")
endif()
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/doxygen.cfg.in
${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg @ONLY)
set(abs_top_srcdir)
set(abs_top_builddir)
set(DOT)
set(enable_searchengine)
set(searchengine_url)
set(enable_server_based_search)
set(enable_external_search)
set(extra_search_mappings)
set(llvm_doxygen_generate_qhp)
set(llvm_doxygen_qch_filename)
set(llvm_doxygen_qhp_namespace)
set(llvm_doxygen_qhelpgenerator_path)
set(llvm_doxygen_qhp_cust_filter_name)
set(llvm_doxygen_qhp_cust_filter_attrs)
set(DOT_IMAGE_FORMAT)
add_custom_target(doxygen-llvm
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating llvm doxygen documentation." VERBATIM)
if (LLVM_BUILD_DOCS)
add_dependencies(doxygen doxygen-llvm)
endif()
if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
# ./ suffix is needed to copy the contents of html directory without
# appending html/ into LLVM_INSTALL_DOXYGEN_HTML_DIR.
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/doxygen/html/.
COMPONENT doxygen-html
DESTINATION "${LLVM_INSTALL_DOXYGEN_HTML_DIR}")
endif()
endif()
endif()
if (LLVM_ENABLE_SPHINX)
include(AddSphinxTarget)
if (SPHINX_FOUND)
if (${SPHINX_OUTPUT_HTML})
add_sphinx_target(html llvm)
endif()
if (${SPHINX_OUTPUT_MAN})
add_sphinx_target(man llvm)
add_sphinx_target(man llvm-dwarfdump)
add_sphinx_target(man dsymutil)
endif()
endif()
endif()
list(FIND LLVM_BINDINGS_LIST ocaml uses_ocaml)
if( NOT uses_ocaml LESS 0 AND LLVM_ENABLE_OCAMLDOC )
set(doc_targets
ocaml_llvm
ocaml_llvm_all_backends
ocaml_llvm_analysis
ocaml_llvm_bitreader
ocaml_llvm_bitwriter
ocaml_llvm_executionengine
ocaml_llvm_irreader
ocaml_llvm_linker
ocaml_llvm_target
ocaml_llvm_ipo
ocaml_llvm_passmgr_builder
ocaml_llvm_scalar_opts
ocaml_llvm_transform_utils
ocaml_llvm_vectorize
)
foreach(llvm_target ${LLVM_TARGETS_TO_BUILD})
list(APPEND doc_targets ocaml_llvm_${llvm_target})
endforeach()
set(odoc_files)
foreach( doc_target ${doc_targets} )
get_target_property(odoc_file ${doc_target} OCAML_ODOC)
list(APPEND odoc_files -load ${odoc_file})
endforeach()
add_custom_target(ocaml_doc
COMMAND ${CMAKE_COMMAND} -E remove_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
COMMAND ${OCAMLFIND} ocamldoc -d ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
-sort -colorize-code -html ${odoc_files}
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/_ocamldoc/style.css
${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html)
add_dependencies(ocaml_doc ${doc_targets})
if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
# ./ suffix is needed to copy the contents of html directory without
# appending html/ into LLVM_INSTALL_OCAMLDOC_HTML_DIR.
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html/.
COMPONENT ocamldoc-html
DESTINATION "${LLVM_INSTALL_OCAMLDOC_HTML_DIR}")
endif()
endif()

View File

@ -1,439 +0,0 @@
============
CMake Primer
============
.. contents::
:local:
.. warning::
Disclaimer: This documentation is written by LLVM project contributors `not`
anyone affiliated with the CMake project. This document may contain
inaccurate terminology, phrasing, or technical details. It is provided with
the best intentions.
Introduction
============
The LLVM project and many of the core projects built on LLVM build using CMake.
This document aims to provide a brief overview of CMake for developers modifying
LLVM projects or building their own projects on top of LLVM.
The official CMake language references is available in the cmake-language
manpage and `cmake-language online documentation
<https://cmake.org/cmake/help/v3.4/manual/cmake-language.7.html>`_.
10,000 ft View
==============
CMake is a tool that reads script files in its own language that describe how a
software project builds. As CMake evaluates the scripts it constructs an
internal representation of the software project. Once the scripts have been
fully processed, if there are no errors, CMake will generate build files to
actually build the project. CMake supports generating build files for a variety
of command line build tools as well as for popular IDEs.
When a user runs CMake it performs a variety of checks similar to how autoconf
worked historically. During the checks and the evaluation of the build
description scripts CMake caches values into the CMakeCache. This is useful
because it allows the build system to skip long-running checks during
incremental development. CMake caching also has some drawbacks, but that will be
discussed later.
Scripting Overview
==================
CMake's scripting language has a very simple grammar. Every language construct
is a command that matches the pattern _name_(_args_). Commands come in three
primary types: language-defined (commands implemented in C++ in CMake), defined
functions, and defined macros. The CMake distribution also contains a suite of
CMake modules that contain definitions for useful functionality.
The example below is the full CMake build for building a C++ "Hello World"
program. The example uses only CMake language-defined functions.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.2)
project(HelloWorld)
add_executable(HelloWorld HelloWorld.cpp)
The CMake language provides control flow constructs in the form of foreach loops
and if blocks. To make the example above more complicated you could add an if
block to define "APPLE" when targeting Apple platforms:
.. code-block:: cmake
cmake_minimum_required(VERSION 3.2)
project(HelloWorld)
add_executable(HelloWorld HelloWorld.cpp)
if(APPLE)
target_compile_definitions(HelloWorld PUBLIC APPLE)
endif()
Variables, Types, and Scope
===========================
Dereferencing
-------------
In CMake variables are "stringly" typed. All variables are represented as
strings throughout evaluation. Wrapping a variable in ``${}`` dereferences it
and results in a literal substitution of the name for the value. CMake refers to
this as "variable evaluation" in their documentation. Dereferences are performed
*before* the command being called receives the arguments. This means
dereferencing a list results in multiple separate arguments being passed to the
command.
Variable dereferences can be nested and be used to model complex data. For
example:
.. code-block:: cmake
set(var_name var1)
set(${var_name} foo) # same as "set(var1 foo)"
set(${${var_name}}_var bar) # same as "set(foo_var bar)"
Dereferencing an unset variable results in an empty expansion. It is a common
pattern in CMake to conditionally set variables knowing that it will be used in
code paths that the variable isn't set. There are examples of this throughout
the LLVM CMake build system.
An example of variable empty expansion is:
.. code-block:: cmake
if(APPLE)
set(extra_sources Apple.cpp)
endif()
add_executable(HelloWorld HelloWorld.cpp ${extra_sources})
In this example the ``extra_sources`` variable is only defined if you're
targeting an Apple platform. For all other targets the ``extra_sources`` will be
evaluated as empty before add_executable is given its arguments.
Lists
-----
In CMake lists are semi-colon delimited strings, and it is strongly advised that
you avoid using semi-colons in lists; it doesn't go smoothly. A few examples of
defining lists:
.. code-block:: cmake
# Creates a list with members a, b, c, and d
set(my_list a b c d)
set(my_list "a;b;c;d")
# Creates a string "a b c d"
set(my_string "a b c d")
Lists of Lists
--------------
One of the more complicated patterns in CMake is lists of lists. Because a list
cannot contain an element with a semi-colon to construct a list of lists you
make a list of variable names that refer to other lists. For example:
.. code-block:: cmake
set(list_of_lists a b c)
set(a 1 2 3)
set(b 4 5 6)
set(c 7 8 9)
With this layout you can iterate through the list of lists printing each value
with the following code:
.. code-block:: cmake
foreach(list_name IN LISTS list_of_lists)
foreach(value IN LISTS ${list_name})
message(${value})
endforeach()
endforeach()
You'll notice that the inner foreach loop's list is doubly dereferenced. This is
because the first dereference turns ``list_name`` into the name of the sub-list
(a, b, or c in the example), then the second dereference is to get the value of
the list.
This pattern is used throughout CMake, the most common example is the compiler
flags options, which CMake refers to using the following variable expansions:
CMAKE_${LANGUAGE}_FLAGS and CMAKE_${LANGUAGE}_FLAGS_${CMAKE_BUILD_TYPE}.
Other Types
-----------
Variables that are cached or specified on the command line can have types
associated with them. The variable's type is used by CMake's UI tool to display
the right input field. A variable's type generally doesn't impact evaluation,
however CMake does have special handling for some variables such as PATH.
You can read more about the special handling in `CMake's set documentation
<https://cmake.org/cmake/help/v3.5/command/set.html#set-cache-entry>`_.
Scope
-----
CMake inherently has a directory-based scoping. Setting a variable in a
CMakeLists file, will set the variable for that file, and all subdirectories.
Variables set in a CMake module that is included in a CMakeLists file will be
set in the scope they are included from, and all subdirectories.
When a variable that is already set is set again in a subdirectory it overrides
the value in that scope and any deeper subdirectories.
The CMake set command provides two scope-related options. PARENT_SCOPE sets a
variable into the parent scope, and not the current scope. The CACHE option sets
the variable in the CMakeCache, which results in it being set in all scopes. The
CACHE option will not set a variable that already exists in the CACHE unless the
FORCE option is specified.
In addition to directory-based scope, CMake functions also have their own scope.
This means variables set inside functions do not bleed into the parent scope.
This is not true of macros, and it is for this reason LLVM prefers functions
over macros whenever reasonable.
.. note::
Unlike C-based languages, CMake's loop and control flow blocks do not have
their own scopes.
Control Flow
============
CMake features the same basic control flow constructs you would expect in any
scripting language, but there are a few quirks because, as with everything in
CMake, control flow constructs are commands.
If, ElseIf, Else
----------------
.. note::
For the full documentation on the CMake if command go
`here <https://cmake.org/cmake/help/v3.4/command/if.html>`_. That resource is
far more complete.
In general CMake if blocks work the way you'd expect:
.. code-block:: cmake
if(<condition>)
message("do stuff")
elseif(<condition>)
message("do other stuff")
else()
message("do other other stuff")
endif()
The single most important thing to know about CMake's if blocks coming from a C
background is that they do not have their own scope. Variables set inside
conditional blocks persist after the ``endif()``.
Loops
-----
The most common form of the CMake ``foreach`` block is:
.. code-block:: cmake
foreach(var ...)
message("do stuff")
endforeach()
The variable argument portion of the ``foreach`` block can contain dereferenced
lists, values to iterate, or a mix of both:
.. code-block:: cmake
foreach(var foo bar baz)
message(${var})
endforeach()
# prints:
# foo
# bar
# baz
set(my_list 1 2 3)
foreach(var ${my_list})
message(${var})
endforeach()
# prints:
# 1
# 2
# 3
foreach(var ${my_list} out_of_bounds)
message(${var})
endforeach()
# prints:
# 1
# 2
# 3
# out_of_bounds
There is also a more modern CMake foreach syntax. The code below is equivalent
to the code above:
.. code-block:: cmake
foreach(var IN ITEMS foo bar baz)
message(${var})
endforeach()
# prints:
# foo
# bar
# baz
set(my_list 1 2 3)
foreach(var IN LISTS my_list)
message(${var})
endforeach()
# prints:
# 1
# 2
# 3
foreach(var IN LISTS my_list ITEMS out_of_bounds)
message(${var})
endforeach()
# prints:
# 1
# 2
# 3
# out_of_bounds
Similar to the conditional statements, these generally behave how you would
expect, and they do not have their own scope.
CMake also supports ``while`` loops, although they are not widely used in LLVM.
Modules, Functions and Macros
=============================
Modules
-------
Modules are CMake's vehicle for enabling code reuse. CMake modules are just
CMake script files. They can contain code to execute on include as well as
definitions for commands.
In CMake macros and functions are universally referred to as commands, and they
are the primary method of defining code that can be called multiple times.
In LLVM we have several CMake modules that are included as part of our
distribution for developers who don't build our project from source. Those
modules are the fundamental pieces needed to build LLVM-based projects with
CMake. We also rely on modules as a way of organizing the build system's
functionality for maintainability and re-use within LLVM projects.
Argument Handling
-----------------
When defining a CMake command handling arguments is very useful. The examples
in this section will all use the CMake ``function`` block, but this all applies
to the ``macro`` block as well.
CMake commands can have named arguments that are requried at every call site. In
addition, all commands will implicitly accept a variable number of extra
arguments (In C parlance, all commands are varargs functions). When a command is
invoked with extra arguments (beyond the named ones) CMake will store the full
list of arguments (both named and unnamed) in a list named ``ARGV``, and the
sublist of unnamed arguments in ``ARGN``. Below is a trivial example of
providing a wrapper function for CMake's built in function ``add_dependencies``.
.. code-block:: cmake
function(add_deps target)
add_dependencies(${target} ${ARGN})
endfunction()
This example defines a new macro named ``add_deps`` which takes a required first
argument, and just calls another function passing through the first argument and
all trailing arguments.
CMake provides a module ``CMakeParseArguments`` which provides an implementation
of advanced argument parsing. We use this all over LLVM, and it is recommended
for any function that has complex argument-based behaviors or optional
arguments. CMake's official documentation for the module is in the
``cmake-modules`` manpage, and is also available at the
`cmake-modules online documentation
<https://cmake.org/cmake/help/v3.4/module/CMakeParseArguments.html>`_.
.. note::
As of CMake 3.5 the cmake_parse_arguments command has become a native command
and the CMakeParseArguments module is empty and only left around for
compatibility.
Functions Vs Macros
-------------------
Functions and Macros look very similar in how they are used, but there is one
fundamental difference between the two. Functions have their own scope, and
macros don't. This means variables set in macros will bleed out into the calling
scope. That makes macros suitable for defining very small bits of functionality
only.
The other difference between CMake functions and macros is how arguments are
passed. Arguments to macros are not set as variables, instead dereferences to
the parameters are resolved across the macro before executing it. This can
result in some unexpected behavior if using unreferenced variables. For example:
.. code-block:: cmake
macro(print_list my_list)
foreach(var IN LISTS my_list)
message("${var}")
endforeach()
endmacro()
set(my_list a b c d)
set(my_list_of_numbers 1 2 3 4)
print_list(my_list_of_numbers)
# prints:
# a
# b
# c
# d
Generally speaking this issue is uncommon because it requires using
non-dereferenced variables with names that overlap in the parent scope, but it
is important to be aware of because it can lead to subtle bugs.
LLVM Project Wrappers
=====================
LLVM projects provide lots of wrappers around critical CMake built-in commands.
We use these wrappers to provide consistent behaviors across LLVM components
and to reduce code duplication.
We generally (but not always) follow the convention that commands prefaced with
``llvm_`` are intended to be used only as building blocks for other commands.
Wrapper commands that are intended for direct use are generally named following
with the project in the middle of the command name (i.e. ``add_llvm_executable``
is the wrapper for ``add_executable``). The LLVM ``add_*`` wrapper functions are
all defined in ``AddLLVM.cmake`` which is installed as part of the LLVM
distribution. It can be included and used by any LLVM sub-project that requires
LLVM.
.. note::
Not all LLVM projects require LLVM for all use cases. For example compiler-rt
can be built without LLVM, and the compiler-rt sanitizer libraries are used
with GCC.
Useful Built-in Commands
========================
CMake has a bunch of useful built-in commands. This document isn't going to
go into details about them because The CMake project has excellent
documentation. To highlight a few useful functions see:
* `add_custom_command <https://cmake.org/cmake/help/v3.4/command/add_custom_command.html>`_
* `add_custom_target <https://cmake.org/cmake/help/v3.4/command/add_custom_target.html>`_
* `file <https://cmake.org/cmake/help/v3.4/command/file.html>`_
* `list <https://cmake.org/cmake/help/v3.4/command/list.html>`_
* `math <https://cmake.org/cmake/help/v3.4/command/math.html>`_
* `string <https://cmake.org/cmake/help/v3.4/command/string.html>`_
The full documentation for CMake commands is in the ``cmake-commands`` manpage
and available on `CMake's website <https://cmake.org/cmake/help/v3.4/manual/cmake-commands.7.html>`_

View File

@ -1 +0,0 @@
5c0fb064959ef9d7bf2ab1d0ba5eafc7547244e7

View File

@ -1,112 +0,0 @@
==============================
LLVM Community Code of Conduct
==============================
.. note::
This document is currently a **DRAFT** document while it is being discussed
by the community.
The LLVM community has always worked to be a welcoming and respectful
community, and we want to ensure that doesn't change as we grow and evolve. To
that end, we have a few ground rules that we ask people to adhere to:
* `be friendly and patient`_,
* `be welcoming`_,
* `be considerate`_,
* `be respectful`_,
* `be careful in the words that you choose and be kind to others`_, and
* `when we disagree, try to understand why`_.
This isn't an exhaustive list of things that you can't do. Rather, take it in
the spirit in which it's intended - a guide to make it easier to communicate
and participate in the community.
This code of conduct applies to all spaces managed by the LLVM project or The
LLVM Foundation. This includes IRC channels, mailing lists, bug trackers, LLVM
events such as the developer meetings and socials, and any other forums created
by the project that the community uses for communication. It applies to all of
your communication and conduct in these spaces, including emails, chats, things
you say, slides, videos, posters, signs, or even t-shirts you display in these
spaces. In addition, violations of this code outside these spaces may, in rare
cases, affect a person's ability to participate within them, when the conduct
amounts to an egregious violation of this code.
If you believe someone is violating the code of conduct, we ask that you report
it by emailing conduct@llvm.org. For more details please see our
:doc:`Reporting Guide <ReportingGuide>`.
.. _be friendly and patient:
* **Be friendly and patient.**
.. _be welcoming:
* **Be welcoming.** We strive to be a community that welcomes and supports
people of all backgrounds and identities. This includes, but is not limited
to members of any race, ethnicity, culture, national origin, colour,
immigration status, social and economic class, educational level, sex, sexual
orientation, gender identity and expression, age, size, family status,
political belief, religion or lack thereof, and mental and physical ability.
.. _be considerate:
* **Be considerate.** Your work will be used by other people, and you in turn
will depend on the work of others. Any decision you take will affect users
and colleagues, and you should take those consequences into account. Remember
that we're a world-wide community, so you might not be communicating in
someone else's primary language.
.. _be respectful:
* **Be respectful.** Not all of us will agree all the time, but disagreement is
no excuse for poor behavior and poor manners. We might all experience some
frustration now and then, but we cannot allow that frustration to turn into
a personal attack. It's important to remember that a community where people
feel uncomfortable or threatened is not a productive one. Members of the LLVM
community should be respectful when dealing with other members as well as
with people outside the LLVM community.
.. _be careful in the words that you choose and be kind to others:
* **Be careful in the words that you choose and be kind to others.** Do not
insult or put down other participants. Harassment and other exclusionary
behavior aren't acceptable. This includes, but is not limited to:
* Violent threats or language directed against another person.
* Discriminatory jokes and language.
* Posting sexually explicit or violent material.
* Posting (or threatening to post) other people's personally identifying
information ("doxing").
* Personal insults, especially those using racist or sexist terms.
* Unwelcome sexual attention.
* Advocating for, or encouraging, any of the above behavior.
In general, if someone asks you to stop, then stop. Persisting in such
behavior after being asked to stop is considered harassment.
.. _when we disagree, try to understand why:
* **When we disagree, try to understand why.** Disagreements, both social and
technical, happen all the time and LLVM is no exception. It is important that
we resolve disagreements and differing views constructively. Remember that
we're different. The strength of LLVM comes from its varied community, people
from a wide range of backgrounds. Different people have different
perspectives on issues. Being unable to understand why someone holds
a viewpoint doesn't mean that they're wrong. Don't forget that it is human to
err and blaming each other doesn't get us anywhere. Instead, focus on helping
to resolve issues and learning from mistakes.
Questions?
==========
If you have questions, please feel free to contact the LLVM Foundation Code of
Conduct Advisory Committee by emailing conduct@llvm.org.
(This text is based on the `Django Project`_ Code of Conduct, which is in turn
based on wording from the `Speak Up! project`_.)
.. _Django Project: https://www.djangoproject.com/conduct/
.. _Speak Up! project: http://speakup.io/coc.html

Some files were not shown because too many files have changed in this diff Show More