Imported Upstream version 6.10.0.49

Former-commit-id: 1d6753294b2993e1fbf92de9366bb9544db4189b
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2020-01-16 16:38:04 +00:00
parent d94e79959b
commit 468663ddbb
48518 changed files with 2789335 additions and 61176 deletions

View File

@ -0,0 +1,289 @@
================
AddressSanitizer
================
.. contents::
:local:
Introduction
============
AddressSanitizer is a fast memory error detector. It consists of a compiler
instrumentation module and a run-time library. The tool can detect the
following types of bugs:
* Out-of-bounds accesses to heap, stack and globals
* Use-after-free
* Use-after-return (runtime flag `ASAN_OPTIONS=detect_stack_use_after_return=1`)
* Use-after-scope (clang flag `-fsanitize-address-use-after-scope`)
* Double-free, invalid free
* Memory leaks (experimental)
Typical slowdown introduced by AddressSanitizer is **2x**.
How to build
============
Build LLVM/Clang with `CMake <http://llvm.org/docs/CMake.html>`_.
Usage
=====
Simply compile and link your program with ``-fsanitize=address`` flag. The
AddressSanitizer run-time library should be linked to the final executable, so
make sure to use ``clang`` (not ``ld``) for the final link step. When linking
shared libraries, the AddressSanitizer run-time is not linked, so
``-Wl,-z,defs`` may cause link errors (don't use it with AddressSanitizer). To
get a reasonable performance add ``-O1`` or higher. To get nicer stack traces
in error messages add ``-fno-omit-frame-pointer``. To get perfect stack traces
you may need to disable inlining (just use ``-O1``) and tail call elimination
(``-fno-optimize-sibling-calls``).
.. code-block:: console
% cat example_UseAfterFree.cc
int main(int argc, char **argv) {
int *array = new int[100];
delete [] array;
return array[argc]; // BOOM
}
# Compile and link
% clang++ -O1 -g -fsanitize=address -fno-omit-frame-pointer example_UseAfterFree.cc
or:
.. code-block:: console
# Compile
% clang++ -O1 -g -fsanitize=address -fno-omit-frame-pointer -c example_UseAfterFree.cc
# Link
% clang++ -g -fsanitize=address example_UseAfterFree.o
If a bug is detected, the program will print an error message to stderr and
exit with a non-zero exit code. AddressSanitizer exits on the first detected error.
This is by design:
* This approach allows AddressSanitizer to produce faster and smaller generated code
(both by ~5%).
* Fixing bugs becomes unavoidable. AddressSanitizer does not produce
false alarms. Once a memory corruption occurs, the program is in an inconsistent
state, which could lead to confusing results and potentially misleading
subsequent reports.
If your process is sandboxed and you are running on OS X 10.10 or earlier, you
will need to set ``DYLD_INSERT_LIBRARIES`` environment variable and point it to
the ASan library that is packaged with the compiler used to build the
executable. (You can find the library by searching for dynamic libraries with
``asan`` in their name.) If the environment variable is not set, the process will
try to re-exec. Also keep in mind that when moving the executable to another machine,
the ASan library will also need to be copied over.
Symbolizing the Reports
=========================
To make AddressSanitizer symbolize its output
you need to set the ``ASAN_SYMBOLIZER_PATH`` environment variable to point to
the ``llvm-symbolizer`` binary (or make sure ``llvm-symbolizer`` is in your
``$PATH``):
.. code-block:: console
% ASAN_SYMBOLIZER_PATH=/usr/local/bin/llvm-symbolizer ./a.out
==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8
READ of size 4 at 0x7f7ddab8c084 thread T0
#0 0x403c8c in main example_UseAfterFree.cc:4
#1 0x7f7ddabcac4d in __libc_start_main ??:0
0x7f7ddab8c084 is located 4 bytes inside of 400-byte region [0x7f7ddab8c080,0x7f7ddab8c210)
freed by thread T0 here:
#0 0x404704 in operator delete[](void*) ??:0
#1 0x403c53 in main example_UseAfterFree.cc:4
#2 0x7f7ddabcac4d in __libc_start_main ??:0
previously allocated by thread T0 here:
#0 0x404544 in operator new[](unsigned long) ??:0
#1 0x403c43 in main example_UseAfterFree.cc:2
#2 0x7f7ddabcac4d in __libc_start_main ??:0
==9442== ABORTING
If that does not work for you (e.g. your process is sandboxed), you can use a
separate script to symbolize the result offline (online symbolization can be
force disabled by setting ``ASAN_OPTIONS=symbolize=0``):
.. code-block:: console
% ASAN_OPTIONS=symbolize=0 ./a.out 2> log
% projects/compiler-rt/lib/asan/scripts/asan_symbolize.py / < log | c++filt
==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8
READ of size 4 at 0x7f7ddab8c084 thread T0
#0 0x403c8c in main example_UseAfterFree.cc:4
#1 0x7f7ddabcac4d in __libc_start_main ??:0
...
Note that on OS X you may need to run ``dsymutil`` on your binary to have the
file\:line info in the AddressSanitizer reports.
Additional Checks
=================
Initialization order checking
-----------------------------
AddressSanitizer can optionally detect dynamic initialization order problems,
when initialization of globals defined in one translation unit uses
globals defined in another translation unit. To enable this check at runtime,
you should set environment variable
``ASAN_OPTIONS=check_initialization_order=1``.
Note that this option is not supported on OS X.
Memory leak detection
---------------------
For more information on leak detector in AddressSanitizer, see
:doc:`LeakSanitizer`. The leak detection is turned on by default on Linux,
and can be enabled using ``ASAN_OPTIONS=detect_leaks=1`` on OS X;
however, it is not yet supported on other platforms.
Issue Suppression
=================
AddressSanitizer is not expected to produce false positives. If you see one,
look again; most likely it is a true positive!
Suppressing Reports in External Libraries
-----------------------------------------
Runtime interposition allows AddressSanitizer to find bugs in code that is
not being recompiled. If you run into an issue in external libraries, we
recommend immediately reporting it to the library maintainer so that it
gets addressed. However, you can use the following suppression mechanism
to unblock yourself and continue on with the testing. This suppression
mechanism should only be used for suppressing issues in external code; it
does not work on code recompiled with AddressSanitizer. To suppress errors
in external libraries, set the ``ASAN_OPTIONS`` environment variable to point
to a suppression file. You can either specify the full path to the file or the
path of the file relative to the location of your executable.
.. code-block:: bash
ASAN_OPTIONS=suppressions=MyASan.supp
Use the following format to specify the names of the functions or libraries
you want to suppress. You can see these in the error report. Remember that
the narrower the scope of the suppression, the more bugs you will be able to
catch.
.. code-block:: bash
interceptor_via_fun:NameOfCFunctionToSuppress
interceptor_via_fun:-[ClassName objCMethodToSuppress:]
interceptor_via_lib:NameOfTheLibraryToSuppress
Conditional Compilation with ``__has_feature(address_sanitizer)``
-----------------------------------------------------------------
In some cases one may need to execute different code depending on whether
AddressSanitizer is enabled.
:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
this purpose.
.. code-block:: c
#if defined(__has_feature)
# if __has_feature(address_sanitizer)
// code that builds only under AddressSanitizer
# endif
#endif
Disabling Instrumentation with ``__attribute__((no_sanitize("address")))``
--------------------------------------------------------------------------
Some code should not be instrumented by AddressSanitizer. One may use the
function attribute ``__attribute__((no_sanitize("address")))`` (which has
deprecated synonyms `no_sanitize_address` and `no_address_safety_analysis`) to
disable instrumentation of a particular function. This attribute may not be
supported by other compilers, so we suggest to use it together with
``__has_feature(address_sanitizer)``.
Suppressing Errors in Recompiled Code (Blacklist)
-------------------------------------------------
AddressSanitizer supports ``src`` and ``fun`` entity types in
:doc:`SanitizerSpecialCaseList`, that can be used to suppress error reports
in the specified source files or functions. Additionally, AddressSanitizer
introduces ``global`` and ``type`` entity types that can be used to
suppress error reports for out-of-bound access to globals with certain
names and types (you may only specify class or struct types).
You may use an ``init`` category to suppress reports about initialization-order
problems happening in certain source files or with certain global variables.
.. code-block:: bash
# Suppress error reports for code in a file or in a function:
src:bad_file.cpp
# Ignore all functions with names containing MyFooBar:
fun:*MyFooBar*
# Disable out-of-bound checks for global:
global:bad_array
# Disable out-of-bound checks for global instances of a given class ...
type:Namespace::BadClassName
# ... or a given struct. Use wildcard to deal with anonymous namespace.
type:Namespace2::*::BadStructName
# Disable initialization-order checks for globals:
global:bad_init_global=init
type:*BadInitClassSubstring*=init
src:bad/init/files/*=init
Suppressing memory leaks
------------------------
Memory leak reports produced by :doc:`LeakSanitizer` (if it is run as a part
of AddressSanitizer) can be suppressed by a separate file passed as
.. code-block:: bash
LSAN_OPTIONS=suppressions=MyLSan.supp
which contains lines of the form `leak:<pattern>`. Memory leak will be
suppressed if pattern matches any function name, source file name, or
library name in the symbolized stack trace of the leak report. See
`full documentation
<https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer#suppressions>`_
for more details.
Limitations
===========
* AddressSanitizer uses more real memory than a native run. Exact overhead
depends on the allocations sizes. The smaller the allocations you make the
bigger the overhead is.
* AddressSanitizer uses more stack memory. We have seen up to 3x increase.
* On 64-bit platforms AddressSanitizer maps (but not reserves) 16+ Terabytes of
virtual address space. This means that tools like ``ulimit`` may not work as
usually expected.
* Static linking is not supported.
Supported Platforms
===================
AddressSanitizer is supported on:
* Linux i386/x86\_64 (tested on Ubuntu 12.04)
* OS X 10.7 - 10.11 (i386/x86\_64)
* iOS Simulator
* Android ARM
* FreeBSD i386/x86\_64 (tested on FreeBSD 11-current)
Ports to various other platforms are in progress.
Current Status
==============
AddressSanitizer is fully functional on supported platforms starting from LLVM
3.1. The test suite is integrated into CMake build and can be run with ``make
check-asan`` command.
More Information
================
`<https://github.com/google/sanitizers/wiki/AddressSanitizer>`_

View File

@ -0,0 +1 @@
d0e02fa72b333c082bcb68e713f7bb01511c480c

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1 @@
*NOTE* This document has moved to http://clang.llvm.org/docs/Block-ABI-Apple.html.

View File

@ -0,0 +1,361 @@
.. role:: block-term
=================================
Language Specification for Blocks
=================================
.. contents::
:local:
Revisions
=========
- 2008/2/25 --- created
- 2008/7/28 --- revised, ``__block`` syntax
- 2008/8/13 --- revised, Block globals
- 2008/8/21 --- revised, C++ elaboration
- 2008/11/1 --- revised, ``__weak`` support
- 2009/1/12 --- revised, explicit return types
- 2009/2/10 --- revised, ``__block`` objects need retain
Overview
========
A new derived type is introduced to C and, by extension, Objective-C,
C++, and Objective-C++
The Block Type
==============
Like function types, the :block-term:`Block type` is a pair consisting
of a result value type and a list of parameter types very similar to a
function type. Blocks are intended to be used much like functions with
the key distinction being that in addition to executable code they
also contain various variable bindings to automatic (stack) or managed
(heap) memory.
The abstract declarator,
.. code-block:: c
int (^)(char, float)
describes a reference to a Block that, when invoked, takes two
parameters, the first of type char and the second of type float, and
returns a value of type int. The Block referenced is of opaque data
that may reside in automatic (stack) memory, global memory, or heap
memory.
Block Variable Declarations
===========================
A :block-term:`variable with Block type` is declared using function
pointer style notation substituting ``^`` for ``*``. The following are
valid Block variable declarations:
.. code-block:: c
void (^blockReturningVoidWithVoidArgument)(void);
int (^blockReturningIntWithIntAndCharArguments)(int, char);
void (^arrayOfTenBlocksReturningVoidWithIntArgument[10])(int);
Variadic ``...`` arguments are supported. [variadic.c] A Block that
takes no arguments must specify void in the argument list [voidarg.c].
An empty parameter list does not represent, as K&R provide, an
unspecified argument list. Note: both gcc and clang support K&R style
as a convenience.
A Block reference may be cast to a pointer of arbitrary type and vice
versa. [cast.c] A Block reference may not be dereferenced via the
pointer dereference operator ``*``, and thus a Block's size may not be
computed at compile time. [sizeof.c]
Block Literal Expressions
=========================
A :block-term:`Block literal expression` produces a reference to a
Block. It is introduced by the use of the ``^`` token as a unary
operator.
.. code-block:: c
Block_literal_expression ::= ^ block_decl compound_statement_body
block_decl ::=
block_decl ::= parameter_list
block_decl ::= type_expression
where type expression is extended to allow ``^`` as a Block reference
(pointer) where ``*`` is allowed as a function reference (pointer).
The following Block literal:
.. code-block:: c
^ void (void) { printf("hello world\n"); }
produces a reference to a Block with no arguments with no return value.
The return type is optional and is inferred from the return
statements. If the return statements return a value, they all must
return a value of the same type. If there is no value returned the
inferred type of the Block is void; otherwise it is the type of the
return statement value.
If the return type is omitted and the argument list is ``( void )``,
the ``( void )`` argument list may also be omitted.
So:
.. code-block:: c
^ ( void ) { printf("hello world\n"); }
and:
.. code-block:: c
^ { printf("hello world\n"); }
are exactly equivalent constructs for the same expression.
The type_expression extends C expression parsing to accommodate Block
reference declarations as it accommodates function pointer
declarations.
Given:
.. code-block:: c
typedef int (*pointerToFunctionThatReturnsIntWithCharArg)(char);
pointerToFunctionThatReturnsIntWithCharArg functionPointer;
^ pointerToFunctionThatReturnsIntWithCharArg (float x) { return functionPointer; }
and:
.. code-block:: c
^ int ((*)(float x))(char) { return functionPointer; }
are equivalent expressions, as is:
.. code-block:: c
^(float x) { return functionPointer; }
[returnfunctionptr.c]
The compound statement body establishes a new lexical scope within
that of its parent. Variables used within the scope of the compound
statement are bound to the Block in the normal manner with the
exception of those in automatic (stack) storage. Thus one may access
functions and global variables as one would expect, as well as static
local variables. [testme]
Local automatic (stack) variables referenced within the compound
statement of a Block are imported and captured by the Block as const
copies. The capture (binding) is performed at the time of the Block
literal expression evaluation.
The compiler is not required to capture a variable if it can prove
that no references to the variable will actually be evaluated.
Programmers can force a variable to be captured by referencing it in a
statement at the beginning of the Block, like so:
.. code-block:: c
(void) foo;
This matters when capturing the variable has side-effects, as it can
in Objective-C or C++.
The lifetime of variables declared in a Block is that of a function;
each activation frame contains a new copy of variables declared within
the local scope of the Block. Such variable declarations should be
allowed anywhere [testme] rather than only when C99 parsing is
requested, including for statements. [testme]
Block literal expressions may occur within Block literal expressions
(nest) and all variables captured by any nested blocks are implicitly
also captured in the scopes of their enclosing Blocks.
A Block literal expression may be used as the initialization value for
Block variables at global or local static scope.
The Invoke Operator
===================
Blocks are :block-term:`invoked` using function call syntax with a
list of expression parameters of types corresponding to the
declaration and returning a result type also according to the
declaration. Given:
.. code-block:: c
int (^x)(char);
void (^z)(void);
int (^(*y))(char) = &x;
the following are all legal Block invocations:
.. code-block:: c
x('a');
(*y)('a');
(true ? x : *y)('a')
The Copy and Release Operations
===============================
The compiler and runtime provide :block-term:`copy` and
:block-term:`release` operations for Block references that create and,
in matched use, release allocated storage for referenced Blocks.
The copy operation ``Block_copy()`` is styled as a function that takes
an arbitrary Block reference and returns a Block reference of the same
type. The release operation, ``Block_release()``, is styled as a
function that takes an arbitrary Block reference and, if dynamically
matched to a Block copy operation, allows recovery of the referenced
allocated memory.
The ``__block`` Storage Qualifier
=================================
In addition to the new Block type we also introduce a new storage
qualifier, :block-term:`__block`, for local variables. [testme: a
__block declaration within a block literal] The ``__block`` storage
qualifier is mutually exclusive to the existing local storage
qualifiers auto, register, and static. [testme] Variables qualified by
``__block`` act as if they were in allocated storage and this storage
is automatically recovered after last use of said variable. An
implementation may choose an optimization where the storage is
initially automatic and only "moved" to allocated (heap) storage upon
a Block_copy of a referencing Block. Such variables may be mutated as
normal variables are.
In the case where a ``__block`` variable is a Block one must assume
that the ``__block`` variable resides in allocated storage and as such
is assumed to reference a Block that is also in allocated storage
(that it is the result of a ``Block_copy`` operation). Despite this
there is no provision to do a ``Block_copy`` or a ``Block_release`` if
an implementation provides initial automatic storage for Blocks. This
is due to the inherent race condition of potentially several threads
trying to update the shared variable and the need for synchronization
around disposing of older values and copying new ones. Such
synchronization is beyond the scope of this language specification.
Control Flow
============
The compound statement of a Block is treated much like a function body
with respect to control flow in that goto, break, and continue do not
escape the Block. Exceptions are treated *normally* in that when
thrown they pop stack frames until a catch clause is found.
Objective-C Extensions
======================
Objective-C extends the definition of a Block reference type to be
that also of id. A variable or expression of Block type may be
messaged or used as a parameter wherever an id may be. The converse is
also true. Block references may thus appear as properties and are
subject to the assign, retain, and copy attribute logic that is
reserved for objects.
All Blocks are constructed to be Objective-C objects regardless of
whether the Objective-C runtime is operational in the program or
not. Blocks using automatic (stack) memory are objects and may be
messaged, although they may not be assigned into ``__weak`` locations
if garbage collection is enabled.
Within a Block literal expression within a method definition
references to instance variables are also imported into the lexical
scope of the compound statement. These variables are implicitly
qualified as references from self, and so self is imported as a const
copy. The net effect is that instance variables can be mutated.
The :block-term:`Block_copy` operator retains all objects held in
variables of automatic storage referenced within the Block expression
(or form strong references if running under garbage collection).
Object variables of ``__block`` storage type are assumed to hold
normal pointers with no provision for retain and release messages.
Foundation defines (and supplies) ``-copy`` and ``-release`` methods for
Blocks.
In the Objective-C and Objective-C++ languages, we allow the
``__weak`` specifier for ``__block`` variables of object type. If
garbage collection is not enabled, this qualifier causes these
variables to be kept without retain messages being sent. This
knowingly leads to dangling pointers if the Block (or a copy) outlives
the lifetime of this object.
In garbage collected environments, the ``__weak`` variable is set to
nil when the object it references is collected, as long as the
``__block`` variable resides in the heap (either by default or via
``Block_copy()``). The initial Apple implementation does in fact
start ``__block`` variables on the stack and migrate them to the heap
only as a result of a ``Block_copy()`` operation.
It is a runtime error to attempt to assign a reference to a
stack-based Block into any storage marked ``__weak``, including
``__weak`` ``__block`` variables.
C++ Extensions
==============
Block literal expressions within functions are extended to allow const
use of C++ objects, pointers, or references held in automatic storage.
As usual, within the block, references to captured variables become
const-qualified, as if they were references to members of a const
object. Note that this does not change the type of a variable of
reference type.
For example, given a class Foo:
.. code-block:: c
Foo foo;
Foo &fooRef = foo;
Foo *fooPtr = &foo;
A Block that referenced these variables would import the variables as
const variations:
.. code-block:: c
const Foo block_foo = foo;
Foo &block_fooRef = fooRef;
Foo *const block_fooPtr = fooPtr;
Captured variables are copied into the Block at the instant of
evaluating the Block literal expression. They are also copied when
calling ``Block_copy()`` on a Block allocated on the stack. In both
cases, they are copied as if the variable were const-qualified, and
it's an error if there's no such constructor.
Captured variables in Blocks on the stack are destroyed when control
leaves the compound statement that contains the Block literal
expression. Captured variables in Blocks on the heap are destroyed
when the reference count of the Block drops to zero.
Variables declared as residing in ``__block`` storage may be initially
allocated in the heap or may first appear on the stack and be copied
to the heap as a result of a ``Block_copy()`` operation. When copied
from the stack, ``__block`` variables are copied using their normal
qualification (i.e. without adding const). In C++11, ``__block``
variables are copied as x-values if that is possible, then as l-values
if not; if both fail, it's an error. The destructor for any initial
stack-based version is called at the variable's normal end of scope.
References to ``this``, as well as references to non-static members of
any enclosing class, are evaluated by capturing ``this`` just like a
normal variable of C pointer type.
Member variables that are Blocks may not be overloaded by the types of
their arguments.

View File

@ -0,0 +1,107 @@
if (DOXYGEN_FOUND)
if (LLVM_ENABLE_DOXYGEN)
set(abs_srcdir ${CMAKE_CURRENT_SOURCE_DIR})
set(abs_builddir ${CMAKE_CURRENT_BINARY_DIR})
if (HAVE_DOT)
set(DOT ${LLVM_PATH_DOT})
endif()
if (LLVM_DOXYGEN_EXTERNAL_SEARCH)
set(enable_searchengine "YES")
set(searchengine_url "${LLVM_DOXYGEN_SEARCHENGINE_URL}")
set(enable_server_based_search "YES")
set(enable_external_search "YES")
set(extra_search_mappings "${LLVM_DOXYGEN_SEARCH_MAPPINGS}")
else()
set(enable_searchengine "NO")
set(searchengine_url "")
set(enable_server_based_search "NO")
set(enable_external_search "NO")
set(extra_search_mappings "")
endif()
# If asked, configure doxygen for the creation of a Qt Compressed Help file.
if (LLVM_ENABLE_DOXYGEN_QT_HELP)
set(CLANG_DOXYGEN_QCH_FILENAME "org.llvm.clang.qch" CACHE STRING
"Filename of the Qt Compressed help file")
set(CLANG_DOXYGEN_QHP_NAMESPACE "org.llvm.clang" CACHE STRING
"Namespace under which the intermediate Qt Help Project file lives")
set(CLANG_DOXYGEN_QHP_CUST_FILTER_NAME "Clang ${CLANG_VERSION}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters")
set(CLANG_DOXYGEN_QHP_CUST_FILTER_ATTRS "Clang,${CLANG_VERSION}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes")
set(clang_doxygen_generate_qhp "YES")
set(clang_doxygen_qch_filename "${CLANG_DOXYGEN_QCH_FILENAME}")
set(clang_doxygen_qhp_namespace "${CLANG_DOXYGEN_QHP_NAMESPACE}")
set(clang_doxygen_qhelpgenerator_path "${LLVM_DOXYGEN_QHELPGENERATOR_PATH}")
set(clang_doxygen_qhp_cust_filter_name "${CLANG_DOXYGEN_QHP_CUST_FILTER_NAME}")
set(clang_doxygen_qhp_cust_filter_attrs "${CLANG_DOXYGEN_QHP_CUST_FILTER_ATTRS}")
else()
set(clang_doxygen_generate_qhp "NO")
set(clang_doxygen_qch_filename "")
set(clang_doxygen_qhp_namespace "")
set(clang_doxygen_qhelpgenerator_path "")
set(clang_doxygen_qhp_cust_filter_name "")
set(clang_doxygen_qhp_cust_filter_attrs "")
endif()
option(LLVM_DOXYGEN_SVG
"Use svg instead of png files for doxygen graphs." OFF)
if (LLVM_DOXYGEN_SVG)
set(DOT_IMAGE_FORMAT "svg")
else()
set(DOT_IMAGE_FORMAT "png")
endif()
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/doxygen.cfg.in
${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg @ONLY)
set(abs_top_srcdir)
set(abs_top_builddir)
set(DOT)
set(enable_searchengine)
set(searchengine_url)
set(enable_server_based_search)
set(enable_external_search)
set(extra_search_mappings)
set(clang_doxygen_generate_qhp)
set(clang_doxygen_qch_filename)
set(clang_doxygen_qhp_namespace)
set(clang_doxygen_qhelpgenerator_path)
set(clang_doxygen_qhp_cust_filter_name)
set(clang_doxygen_qhp_cust_filter_attrs)
set(DOT_IMAGE_FORMAT)
add_custom_target(doxygen-clang
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating clang doxygen documentation." VERBATIM)
if (LLVM_BUILD_DOCS)
add_dependencies(doxygen doxygen-clang)
endif()
if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/doxygen/html
DESTINATION docs/html)
endif()
endif()
endif()
if (LLVM_ENABLE_SPHINX)
include(AddSphinxTarget)
if (SPHINX_FOUND)
if (${SPHINX_OUTPUT_HTML})
add_sphinx_target(html clang)
add_custom_command(TARGET docs-clang-html POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy
"${CMAKE_CURRENT_SOURCE_DIR}/LibASTMatchersReference.html"
"${CMAKE_CURRENT_BINARY_DIR}/html/LibASTMatchersReference.html")
endif()
if (${SPHINX_OUTPUT_MAN})
add_sphinx_target(man clang)
endif()
endif()
endif()

View File

@ -0,0 +1,36 @@
==========
ClangCheck
==========
`ClangCheck` is a small wrapper around :doc:`LibTooling` which can be used to
do basic error checking and AST dumping.
.. code-block:: console
$ cat <<EOF > snippet.cc
> void f() {
> int a = 0
> }
> EOF
$ ~/clang/build/bin/clang-check snippet.cc -ast-dump --
Processing: /Users/danieljasper/clang/llvm/tools/clang/docs/snippet.cc.
/Users/danieljasper/clang/llvm/tools/clang/docs/snippet.cc:2:12: error: expected ';' at end of
declaration
int a = 0
^
;
(TranslationUnitDecl 0x7ff3a3029ed0 <<invalid sloc>>
(TypedefDecl 0x7ff3a302a410 <<invalid sloc>> __int128_t '__int128')
(TypedefDecl 0x7ff3a302a470 <<invalid sloc>> __uint128_t 'unsigned __int128')
(TypedefDecl 0x7ff3a302a830 <<invalid sloc>> __builtin_va_list '__va_list_tag [1]')
(FunctionDecl 0x7ff3a302a8d0 </Users/danieljasper/clang/llvm/tools/clang/docs/snippet.cc:1:1, line:3:1> f 'void (void)'
(CompoundStmt 0x7ff3a302aa10 <line:1:10, line:3:1>
(DeclStmt 0x7ff3a302a9f8 <line:2:3, line:3:1>
(VarDecl 0x7ff3a302a980 <line:2:3, col:11> a 'int'
(IntegerLiteral 0x7ff3a302a9d8 <col:11> 'int' 0))))))
1 error generated.
Error while processing snippet.cc.
The '--' at the end is important as it prevents :program:`clang-check` from
searching for a compilation database. For more information on how to setup and
use :program:`clang-check` in a project, see :doc:`HowToSetupToolingForLLVM`.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,210 @@
===========
ClangFormat
===========
`ClangFormat` describes a set of tools that are built on top of
:doc:`LibFormat`. It can support your workflow in a variety of ways including a
standalone tool and editor integrations.
Standalone Tool
===============
:program:`clang-format` is located in `clang/tools/clang-format` and can be used
to format C/C++/Java/JavaScript/Objective-C/Protobuf code.
.. code-block:: console
$ clang-format -help
OVERVIEW: A tool to format C/C++/Java/JavaScript/Objective-C/Protobuf code.
If no arguments are specified, it formats the code from standard input
and writes the result to the standard output.
If <file>s are given, it reformats the files. If -i is specified
together with <file>s, the files are edited in-place. Otherwise, the
result is written to the standard output.
USAGE: clang-format [options] [<file> ...]
OPTIONS:
Clang-format options:
-assume-filename=<string> - When reading from stdin, clang-format assumes this
filename to look for a style config file (with
-style=file) and to determine the language.
-cursor=<uint> - The position of the cursor when invoking
clang-format from an editor integration
-dump-config - Dump configuration options to stdout and exit.
Can be used with -style option.
-fallback-style=<string> - The name of the predefined style used as a
fallback in case clang-format is invoked with
-style=file, but can not find the .clang-format
file to use.
Use -fallback-style=none to skip formatting.
-i - Inplace edit <file>s, if specified.
-length=<uint> - Format a range of this length (in bytes).
Multiple ranges can be formatted by specifying
several -offset and -length pairs.
When only a single -offset is specified without
-length, clang-format will format up to the end
of the file.
Can only be used with one input file.
-lines=<string> - <start line>:<end line> - format a range of
lines (both 1-based).
Multiple ranges can be formatted by specifying
several -lines arguments.
Can't be used with -offset and -length.
Can only be used with one input file.
-offset=<uint> - Format a range starting at this byte offset.
Multiple ranges can be formatted by specifying
several -offset and -length pairs.
Can only be used with one input file.
-output-replacements-xml - Output replacements as XML.
-sort-includes - Sort touched include lines
-style=<string> - Coding style, currently supports:
LLVM, Google, Chromium, Mozilla, WebKit.
Use -style=file to load style configuration from
.clang-format file located in one of the parent
directories of the source file (or current
directory for stdin).
Use -style="{key: value, ...}" to set specific
parameters, e.g.:
-style="{BasedOnStyle: llvm, IndentWidth: 8}"
-verbose - If set, shows the list of processed files
Generic Options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
-version - Display the version of this program
When the desired code formatting style is different from the available options,
the style can be customized using the ``-style="{key: value, ...}"`` option or
by putting your style configuration in the ``.clang-format`` or ``_clang-format``
file in your project's directory and using ``clang-format -style=file``.
An easy way to create the ``.clang-format`` file is:
.. code-block:: console
clang-format -style=llvm -dump-config > .clang-format
Available style options are described in :doc:`ClangFormatStyleOptions`.
Vim Integration
===============
There is an integration for :program:`vim` which lets you run the
:program:`clang-format` standalone tool on your current buffer, optionally
selecting regions to reformat. The integration has the form of a `python`-file
which can be found under `clang/tools/clang-format/clang-format.py`.
This can be integrated by adding the following to your `.vimrc`:
.. code-block:: vim
map <C-K> :pyf <path-to-this-file>/clang-format.py<cr>
imap <C-K> <c-o>:pyf <path-to-this-file>/clang-format.py<cr>
The first line enables :program:`clang-format` for NORMAL and VISUAL mode, the
second line adds support for INSERT mode. Change "C-K" to another binding if
you need :program:`clang-format` on a different key (C-K stands for Ctrl+k).
With this integration you can press the bound key and clang-format will
format the current line in NORMAL and INSERT mode or the selected region in
VISUAL mode. The line or region is extended to the next bigger syntactic
entity.
It operates on the current, potentially unsaved buffer and does not create
or save any files. To revert a formatting, just undo.
An alternative option is to format changes when saving a file and thus to
have a zero-effort integration into the coding workflow. To do this, add this to
your `.vimrc`:
.. code-block:: vim
function! Formatonsave()
let l:formatdiff = 1
pyf ~/llvm/tools/clang/tools/clang-format/clang-format.py
endfunction
autocmd BufWritePre *.h,*.cc,*.cpp call Formatonsave()
Emacs Integration
=================
Similar to the integration for :program:`vim`, there is an integration for
:program:`emacs`. It can be found at `clang/tools/clang-format/clang-format.el`
and used by adding this to your `.emacs`:
.. code-block:: common-lisp
(load "<path-to-clang>/tools/clang-format/clang-format.el")
(global-set-key [C-M-tab] 'clang-format-region)
This binds the function `clang-format-region` to C-M-tab, which then formats the
current line or selected region.
BBEdit Integration
==================
:program:`clang-format` cannot be used as a text filter with BBEdit, but works
well via a script. The AppleScript to do this integration can be found at
`clang/tools/clang-format/clang-format-bbedit.applescript`; place a copy in
`~/Library/Application Support/BBEdit/Scripts`, and edit the path within it to
point to your local copy of :program:`clang-format`.
With this integration you can select the script from the Script menu and
:program:`clang-format` will format the selection. Note that you can rename the
menu item by renaming the script, and can assign the menu item a keyboard
shortcut in the BBEdit preferences, under Menus & Shortcuts.
Visual Studio Integration
=========================
Download the latest Visual Studio extension from the `alpha build site
<http://llvm.org/builds/>`_. The default key-binding is Ctrl-R,Ctrl-F.
Script for patch reformatting
=============================
The python script `clang/tools/clang-format/clang-format-diff.py` parses the
output of a unified diff and reformats all contained lines with
:program:`clang-format`.
.. code-block:: console
usage: clang-format-diff.py [-h] [-i] [-p NUM] [-regex PATTERN] [-style STYLE]
Reformat changed lines in diff. Without -i option just output the diff that
would be introduced.
optional arguments:
-h, --help show this help message and exit
-i apply edits to files instead of displaying a diff
-p NUM strip the smallest prefix containing P slashes
-regex PATTERN custom pattern selecting file paths to reformat
-style STYLE formatting style to apply (LLVM, Google, Chromium, Mozilla,
WebKit)
So to reformat all the lines in the latest :program:`git` commit, just do:
.. code-block:: console
git diff -U0 --no-color HEAD^ | clang-format-diff.py -i -p1
In an SVN client, you can do:
.. code-block:: console
svn diff --diff-cmd=diff -x -U0 | clang-format-diff.py -i
The option `-U0` will create a diff without context lines (the script would format
those as well).

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,130 @@
=============
Clang Plugins
=============
Clang Plugins make it possible to run extra user defined actions during a
compilation. This document will provide a basic walkthrough of how to write and
run a Clang Plugin.
Introduction
============
Clang Plugins run FrontendActions over code. See the :doc:`FrontendAction
tutorial <RAVFrontendAction>` on how to write a ``FrontendAction`` using the
``RecursiveASTVisitor``. In this tutorial, we'll demonstrate how to write a
simple clang plugin.
Writing a ``PluginASTAction``
=============================
The main difference from writing normal ``FrontendActions`` is that you can
handle plugin command line options. The ``PluginASTAction`` base class declares
a ``ParseArgs`` method which you have to implement in your plugin.
.. code-block:: c++
bool ParseArgs(const CompilerInstance &CI,
const std::vector<std::string>& args) {
for (unsigned i = 0, e = args.size(); i != e; ++i) {
if (args[i] == "-some-arg") {
// Handle the command line argument.
}
}
return true;
}
Registering a plugin
====================
A plugin is loaded from a dynamic library at runtime by the compiler. To
register a plugin in a library, use ``FrontendPluginRegistry::Add<>``:
.. code-block:: c++
static FrontendPluginRegistry::Add<MyPlugin> X("my-plugin-name", "my plugin description");
Defining pragmas
================
Plugins can also define pragmas by declaring a ``PragmaHandler`` and
registering it using ``PragmaHandlerRegistry::Add<>``:
.. code-block:: c++
// Define a pragma handler for #pragma example_pragma
class ExamplePragmaHandler : public PragmaHandler {
public:
ExamplePragmaHandler() : PragmaHandler("example_pragma") { }
void HandlePragma(Preprocessor &PP, PragmaIntroducerKind Introducer,
Token &PragmaTok) {
// Handle the pragma
}
};
static PragmaHandlerRegistry::Add<ExamplePragmaHandler> Y("example_pragma","example pragma description");
Putting it all together
=======================
Let's look at an example plugin that prints top-level function names. This
example is checked into the clang repository; please take a look at
the `latest version of PrintFunctionNames.cpp
<http://llvm.org/viewvc/llvm-project/cfe/trunk/examples/PrintFunctionNames/PrintFunctionNames.cpp?view=markup>`_.
Running the plugin
==================
Using the cc1 command line
--------------------------
To run a plugin, the dynamic library containing the plugin registry must be
loaded via the `-load` command line option. This will load all plugins
that are registered, and you can select the plugins to run by specifying the
`-plugin` option. Additional parameters for the plugins can be passed with
`-plugin-arg-<plugin-name>`.
Note that those options must reach clang's cc1 process. There are two
ways to do so:
* Directly call the parsing process by using the `-cc1` option; this
has the downside of not configuring the default header search paths, so
you'll need to specify the full system path configuration on the command
line.
* Use clang as usual, but prefix all arguments to the cc1 process with
`-Xclang`.
For example, to run the ``print-function-names`` plugin over a source file in
clang, first build the plugin, and then call clang with the plugin from the
source tree:
.. code-block:: console
$ export BD=/path/to/build/directory
$ (cd $BD && make PrintFunctionNames )
$ clang++ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_GNU_SOURCE \
-I$BD/tools/clang/include -Itools/clang/include -I$BD/include -Iinclude \
tools/clang/tools/clang-check/ClangCheck.cpp -fsyntax-only \
-Xclang -load -Xclang $BD/lib/PrintFunctionNames.so -Xclang \
-plugin -Xclang print-fns
Also see the print-function-name plugin example's
`README <http://llvm.org/viewvc/llvm-project/cfe/trunk/examples/PrintFunctionNames/README.txt?view=markup>`_
Using the clang command line
----------------------------
Using `-fplugin=plugin` on the clang command line passes the plugin
through as an argument to `-load` on the cc1 command line. If the plugin
class implements the ``getActionType`` method then the plugin is run
automatically. For example, to run the plugin automatically after the main AST
action (i.e. the same as using `-add-plugin`):
.. code-block:: c++
// Automatically run the plugin after the main AST action
PluginASTAction::ActionType getActionType() override {
return AddAfterMainAction;
}

View File

@ -0,0 +1,167 @@
========
Overview
========
Clang Tools are standalone command line (and potentially GUI) tools
designed for use by C++ developers who are already using and enjoying
Clang as their compiler. These tools provide developer-oriented
functionality such as fast syntax checking, automatic formatting,
refactoring, etc.
Only a couple of the most basic and fundamental tools are kept in the
primary Clang Subversion project. The rest of the tools are kept in a
side-project so that developers who don't want or need to build them
don't. If you want to get access to the extra Clang Tools repository,
simply check it out into the tools tree of your Clang checkout and
follow the usual process for building and working with a combined
LLVM/Clang checkout:
- With Subversion:
- ``cd llvm/tools/clang/tools``
- ``svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk extra``
- Or with Git:
- ``cd llvm/tools/clang/tools``
- ``git clone http://llvm.org/git/clang-tools-extra.git extra``
This document describes a high-level overview of the organization of
Clang Tools within the project as well as giving an introduction to some
of the more important tools. However, it should be noted that this
document is currently focused on Clang and Clang Tool developers, not on
end users of these tools.
Clang Tools Organization
========================
Clang Tools are CLI or GUI programs that are intended to be directly
used by C++ developers. That is they are *not* primarily for use by
Clang developers, although they are hopefully useful to C++ developers
who happen to work on Clang, and we try to actively dogfood their
functionality. They are developed in three components: the underlying
infrastructure for building a standalone tool based on Clang, core
shared logic used by many different tools in the form of refactoring and
rewriting libraries, and the tools themselves.
The underlying infrastructure for Clang Tools is the
:doc:`LibTooling <LibTooling>` platform. See its documentation for much
more detailed information about how this infrastructure works. The
common refactoring and rewriting toolkit-style library is also part of
LibTooling organizationally.
A few Clang Tools are developed along side the core Clang libraries as
examples and test cases of fundamental functionality. However, most of
the tools are developed in a side repository to provide easy separation
from the core libraries. We intentionally do not support public
libraries in the side repository, as we want to carefully review and
find good APIs for libraries as they are lifted out of a few tools and
into the core Clang library set.
Regardless of which repository Clang Tools' code resides in, the
development process and practices for all Clang Tools are exactly those
of Clang itself. They are entirely within the Clang *project*,
regardless of the version control scheme.
Core Clang Tools
================
The core set of Clang tools that are within the main repository are
tools that very specifically complement, and allow use and testing of
*Clang* specific functionality.
``clang-check``
---------------
:doc:`ClangCheck` combines the LibTooling framework for running a
Clang tool with the basic Clang diagnostics by syntax checking specific files
in a fast, command line interface. It can also accept flags to re-display the
diagnostics in different formats with different flags, suitable for use driving
an IDE or editor. Furthermore, it can be used in fixit-mode to directly apply
fixit-hints offered by clang. See :doc:`HowToSetupToolingForLLVM` for
instructions on how to setup and used `clang-check`.
``clang-format``
----------------
Clang-format is both a :doc:`library <LibFormat>` and a :doc:`stand-alone tool
<ClangFormat>` with the goal of automatically reformatting C++ sources files
according to configurable style guides. To do so, clang-format uses Clang's
``Lexer`` to transform an input file into a token stream and then changes all
the whitespace around those tokens. The goal is for clang-format to serve both
as a user tool (ideally with powerful IDE integrations) and as part of other
refactoring tools, e.g. to do a reformatting of all the lines changed during a
renaming.
Extra Clang Tools
=================
As various categories of Clang Tools are added to the extra repository,
they'll be tracked here. The focus of this documentation is on the scope
and features of the tools for other tool developers; each tool should
provide its own user-focused documentation.
``clang-tidy``
--------------
`clang-tidy <http://clang.llvm.org/extra/clang-tidy/>`_ is a clang-based C++
linter tool. It provides an extensible framework for building compiler-based
static analyses detecting and fixing bug-prone patterns, performance,
portability and maintainability issues.
Ideas for new Tools
===================
* C++ cast conversion tool. Will convert C-style casts (``(type) value``) to
appropriate C++ cast (``static_cast``, ``const_cast`` or
``reinterpret_cast``).
* Non-member ``begin()`` and ``end()`` conversion tool. Will convert
``foo.begin()`` into ``begin(foo)`` and similarly for ``end()``, where
``foo`` is a standard container. We could also detect similar patterns for
arrays.
* ``tr1`` removal tool. Will migrate source code from using TR1 library
features to C++11 library. For example:
.. code-block:: c++
#include <tr1/unordered_map>
int main()
{
std::tr1::unordered_map <int, int> ma;
std::cout << ma.size () << std::endl;
return 0;
}
should be rewritten to:
.. code-block:: c++
#include <unordered_map>
int main()
{
std::unordered_map <int, int> ma;
std::cout << ma.size () << std::endl;
return 0;
}
* A tool to remove ``auto``. Will convert ``auto`` to an explicit type or add
comments with deduced types. The motivation is that there are developers
that don't want to use ``auto`` because they are afraid that they might lose
control over their code.
* C++14: less verbose operator function objects (`N3421
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3421.htm>`_).
For example:
.. code-block:: c++
sort(v.begin(), v.end(), greater<ValueType>());
should be rewritten to:
.. code-block:: c++
sort(v.begin(), v.end(), greater<>());

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,17 @@
Clang "man" pages
-----------------
The following documents are command descriptions for all of the Clang tools.
These pages describe how to use the Clang commands and what their options are.
Note that these pages do not describe all of the options available for all
tools. To get a complete listing, pass the ``--help`` (general options) or
``--help-hidden`` (general and debugging options) arguments to the tool you are
interested in.
Basic Commands
~~~~~~~~~~~~~~
.. toctree::
:maxdepth: 1
clang

View File

@ -0,0 +1,311 @@
======================
Control Flow Integrity
======================
.. toctree::
:hidden:
ControlFlowIntegrityDesign
.. contents::
:local:
Introduction
============
Clang includes an implementation of a number of control flow integrity (CFI)
schemes, which are designed to abort the program upon detecting certain forms
of undefined behavior that can potentially allow attackers to subvert the
program's control flow. These schemes have been optimized for performance,
allowing developers to enable them in release builds.
To enable Clang's available CFI schemes, use the flag ``-fsanitize=cfi``.
You can also enable a subset of available :ref:`schemes <cfi-schemes>`.
As currently implemented, all schemes rely on link-time optimization (LTO);
so it is required to specify ``-flto``, and the linker used must support LTO,
for example via the `gold plugin`_.
To allow the checks to be implemented efficiently, the program must
be structured such that certain object files are compiled with CFI
enabled, and are statically linked into the program. This may preclude
the use of shared libraries in some cases.
The compiler will only produce CFI checks for a class if it can infer hidden
LTO visibility for that class. LTO visibility is a property of a class that
is inferred from flags and attributes. For more details, see the documentation
for :doc:`LTO visibility <LTOVisibility>`.
The ``-fsanitize=cfi-{vcall,nvcall,derived-cast,unrelated-cast}`` flags
require that a ``-fvisibility=`` flag also be specified. This is because the
default visibility setting is ``-fvisibility=default``, which would disable
CFI checks for classes without visibility attributes. Most users will want
to specify ``-fvisibility=hidden``, which enables CFI checks for such classes.
Experimental support for :ref:`cross-DSO control flow integrity
<cfi-cross-dso>` exists that does not require classes to have hidden LTO
visibility. This cross-DSO support has unstable ABI at this time.
.. _gold plugin: http://llvm.org/docs/GoldPlugin.html
.. _cfi-schemes:
Available schemes
=================
Available schemes are:
- ``-fsanitize=cfi-cast-strict``: Enables :ref:`strict cast checks
<cfi-strictness>`.
- ``-fsanitize=cfi-derived-cast``: Base-to-derived cast to the wrong
dynamic type.
- ``-fsanitize=cfi-unrelated-cast``: Cast from ``void*`` or another
unrelated type to the wrong dynamic type.
- ``-fsanitize=cfi-nvcall``: Non-virtual call via an object whose vptr is of
the wrong dynamic type.
- ``-fsanitize=cfi-vcall``: Virtual call via an object whose vptr is of the
wrong dynamic type.
- ``-fsanitize=cfi-icall``: Indirect call of a function with wrong dynamic
type.
You can use ``-fsanitize=cfi`` to enable all the schemes and use
``-fno-sanitize`` flag to narrow down the set of schemes as desired.
For example, you can build your program with
``-fsanitize=cfi -fno-sanitize=cfi-nvcall,cfi-icall``
to use all schemes except for non-virtual member function call and indirect call
checking.
Remember that you have to provide ``-flto`` if at least one CFI scheme is
enabled.
Trapping and Diagnostics
========================
By default, CFI will abort the program immediately upon detecting a control
flow integrity violation. You can use the :ref:`-fno-sanitize-trap=
<controlling-code-generation>` flag to cause CFI to print a diagnostic
similar to the one below before the program aborts.
.. code-block:: console
bad-cast.cpp:109:7: runtime error: control flow integrity check for type 'B' failed during base-to-derived cast (vtable address 0x000000425a50)
0x000000425a50: note: vtable is of type 'A'
00 00 00 00 f0 f1 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 5a 42 00
^
If diagnostics are enabled, you can also configure CFI to continue program
execution instead of aborting by using the :ref:`-fsanitize-recover=
<controlling-code-generation>` flag.
Forward-Edge CFI for Virtual Calls
==================================
This scheme checks that virtual calls take place using a vptr of the correct
dynamic type; that is, the dynamic type of the called object must be a
derived class of the static type of the object used to make the call.
This CFI scheme can be enabled on its own using ``-fsanitize=cfi-vcall``.
For this scheme to work, all translation units containing the definition
of a virtual member function (whether inline or not), other than members
of :ref:`blacklisted <cfi-blacklist>` types, must be compiled with
``-fsanitize=cfi-vcall`` enabled and be statically linked into the program.
Performance
-----------
A performance overhead of less than 1% has been measured by running the
Dromaeo benchmark suite against an instrumented version of the Chromium
web browser. Another good performance benchmark for this mechanism is the
virtual-call-heavy SPEC 2006 xalancbmk.
Note that this scheme has not yet been optimized for binary size; an increase
of up to 15% has been observed for Chromium.
Bad Cast Checking
=================
This scheme checks that pointer casts are made to an object of the correct
dynamic type; that is, the dynamic type of the object must be a derived class
of the pointee type of the cast. The checks are currently only introduced
where the class being casted to is a polymorphic class.
Bad casts are not in themselves control flow integrity violations, but they
can also create security vulnerabilities, and the implementation uses many
of the same mechanisms.
There are two types of bad cast that may be forbidden: bad casts
from a base class to a derived class (which can be checked with
``-fsanitize=cfi-derived-cast``), and bad casts from a pointer of
type ``void*`` or another unrelated type (which can be checked with
``-fsanitize=cfi-unrelated-cast``).
The difference between these two types of casts is that the first is defined
by the C++ standard to produce an undefined value, while the second is not
in itself undefined behavior (it is well defined to cast the pointer back
to its original type) unless the object is uninitialized and the cast is a
``static_cast`` (see C++14 [basic.life]p5).
If a program as a matter of policy forbids the second type of cast, that
restriction can normally be enforced. However it may in some cases be necessary
for a function to perform a forbidden cast to conform with an external API
(e.g. the ``allocate`` member function of a standard library allocator). Such
functions may be :ref:`blacklisted <cfi-blacklist>`.
For this scheme to work, all translation units containing the definition
of a virtual member function (whether inline or not), other than members
of :ref:`blacklisted <cfi-blacklist>` types, must be compiled with
``-fsanitize=cfi-derived-cast`` or ``-fsanitize=cfi-unrelated-cast`` enabled
and be statically linked into the program.
Non-Virtual Member Function Call Checking
=========================================
This scheme checks that non-virtual calls take place using an object of
the correct dynamic type; that is, the dynamic type of the called object
must be a derived class of the static type of the object used to make the
call. The checks are currently only introduced where the object is of a
polymorphic class type. This CFI scheme can be enabled on its own using
``-fsanitize=cfi-nvcall``.
For this scheme to work, all translation units containing the definition
of a virtual member function (whether inline or not), other than members
of :ref:`blacklisted <cfi-blacklist>` types, must be compiled with
``-fsanitize=cfi-nvcall`` enabled and be statically linked into the program.
.. _cfi-strictness:
Strictness
----------
If a class has a single non-virtual base and does not introduce or override
virtual member functions or fields other than an implicitly defined virtual
destructor, it will have the same layout and virtual function semantics as
its base. By default, casts to such classes are checked as if they were made
to the least derived such class.
Casting an instance of a base class to such a derived class is technically
undefined behavior, but it is a relatively common hack for introducing
member functions on class instances with specific properties that works under
most compilers and should not have security implications, so we allow it by
default. It can be disabled with ``-fsanitize=cfi-cast-strict``.
Indirect Function Call Checking
===============================
This scheme checks that function calls take place using a function of the
correct dynamic type; that is, the dynamic type of the function must match
the static type used at the call. This CFI scheme can be enabled on its own
using ``-fsanitize=cfi-icall``.
For this scheme to work, each indirect function call in the program, other
than calls in :ref:`blacklisted <cfi-blacklist>` functions, must call a
function which was either compiled with ``-fsanitize=cfi-icall`` enabled,
or whose address was taken by a function in a translation unit compiled with
``-fsanitize=cfi-icall``.
If a function in a translation unit compiled with ``-fsanitize=cfi-icall``
takes the address of a function not compiled with ``-fsanitize=cfi-icall``,
that address may differ from the address taken by a function in a translation
unit not compiled with ``-fsanitize=cfi-icall``. This is technically a
violation of the C and C++ standards, but it should not affect most programs.
Each translation unit compiled with ``-fsanitize=cfi-icall`` must be
statically linked into the program or shared library, and calls across
shared library boundaries are handled as if the callee was not compiled with
``-fsanitize=cfi-icall``.
This scheme is currently only supported on the x86 and x86_64 architectures.
``-fsanitize-cfi-icall-generalize-pointers``
--------------------------------------------
Mismatched pointer types are a common cause of cfi-icall check failures.
Translation units compiled with the ``-fsanitize-cfi-icall-generalize-pointers``
flag relax pointer type checking for call sites in that translation unit,
applied across all functions compiled with ``-fsanitize=cfi-icall``.
Specifically, pointers in return and argument types are treated as equivalent as
long as the qualifiers for the type they point to match. For example, ``char*``
``char**`, and ``int*`` are considered equivalent types. However, ``char*`` and
``const char*`` are considered separate types.
``-fsanitize-cfi-icall-generalize-pointers`` is not compatible with
``-fsanitize-cfi-cross-dso``.
``-fsanitize=cfi-icall`` and ``-fsanitize=function``
----------------------------------------------------
This tool is similar to ``-fsanitize=function`` in that both tools check
the types of function calls. However, the two tools occupy different points
on the design space; ``-fsanitize=function`` is a developer tool designed
to find bugs in local development builds, whereas ``-fsanitize=cfi-icall``
is a security hardening mechanism designed to be deployed in release builds.
``-fsanitize=function`` has a higher space and time overhead due to a more
complex type check at indirect call sites, as well as a need for run-time
type information (RTTI), which may make it unsuitable for deployment. Because
of the need for RTTI, ``-fsanitize=function`` can only be used with C++
programs, whereas ``-fsanitize=cfi-icall`` can protect both C and C++ programs.
On the other hand, ``-fsanitize=function`` conforms more closely with the C++
standard and user expectations around interaction with shared libraries;
the identity of function pointers is maintained, and calls across shared
library boundaries are no different from calls within a single program or
shared library.
.. _cfi-blacklist:
Blacklist
=========
A :doc:`SanitizerSpecialCaseList` can be used to relax CFI checks for certain
source files, functions and types using the ``src``, ``fun`` and ``type``
entity types. Specific CFI modes can be be specified using ``[section]``
headers.
.. code-block:: bash
# Suppress all CFI checking for code in a file.
src:bad_file.cpp
src:bad_header.h
# Ignore all functions with names containing MyFooBar.
fun:*MyFooBar*
# Ignore all types in the standard library.
type:std::*
# Disable only unrelated cast checks for this function
[cfi-unrelated-cast]
fun:*UnrelatedCast*
# Disable CFI call checks for this function without affecting cast checks
[cfi-vcall|cfi-nvcall|cfi-icall]
fun:*BadCall*
.. _cfi-cross-dso:
Shared library support
======================
Use **-f[no-]sanitize-cfi-cross-dso** to enable the cross-DSO control
flow integrity mode, which allows all CFI schemes listed above to
apply across DSO boundaries. As in the regular CFI, each DSO must be
built with ``-flto``.
Normally, CFI checks will only be performed for classes that have hidden LTO
visibility. With this flag enabled, the compiler will emit cross-DSO CFI
checks for all classes, except for those which appear in the CFI blacklist
or which use a ``no_sanitize`` attribute.
Design
======
Please refer to the :doc:`design document<ControlFlowIntegrityDesign>`.
Publications
============
`Control-Flow Integrity: Principles, Implementations, and Applications <http://research.microsoft.com/pubs/64250/ccs05.pdf>`_.
Martin Abadi, Mihai Budiu, Ăšlfar Erlingsson, Jay Ligatti.
`Enforcing Forward-Edge Control-Flow Integrity in GCC & LLVM <http://www.pcc.me.uk/~peter/acad/usenix14.pdf>`_.
Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway,
Ăšlfar Erlingsson, Luis Lozano, Geoff Pike.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,203 @@
===================================================================
Cross-compilation using Clang
===================================================================
Introduction
============
This document will guide you in choosing the right Clang options
for cross-compiling your code to a different architecture. It assumes you
already know how to compile the code in question for the host architecture,
and that you know how to choose additional include and library paths.
However, this document is *not* a "how to" and won't help you setting your
build system or Makefiles, nor choosing the right CMake options, etc.
Also, it does not cover all the possible options, nor does it contain
specific examples for specific architectures. For a concrete example, the
`instructions for cross-compiling LLVM itself
<http://llvm.org/docs/HowToCrossCompileLLVM.html>`_ may be of interest.
After reading this document, you should be familiar with the main issues
related to cross-compilation, and what main compiler options Clang provides
for performing cross-compilation.
Cross compilation issues
========================
In GCC world, every host/target combination has its own set of binaries,
headers, libraries, etc. So, it's usually simple to download a package
with all files in, unzip to a directory and point the build system to
that compiler, that will know about its location and find all it needs to
when compiling your code.
On the other hand, Clang/LLVM is natively a cross-compiler, meaning that
one set of programs can compile to all targets by setting the ``-target``
option. That makes it a lot easier for programmers wishing to compile to
different platforms and architectures, and for compiler developers that
only have to maintain one build system, and for OS distributions, that
need only one set of main packages.
But, as is true to any cross-compiler, and given the complexity of
different architectures, OS's and options, it's not always easy finding
the headers, libraries or binutils to generate target specific code.
So you'll need special options to help Clang understand what target
you're compiling to, where your tools are, etc.
Another problem is that compilers come with standard libraries only (like
``compiler-rt``, ``libcxx``, ``libgcc``, ``libm``, etc), so you'll have to
find and make available to the build system, every other library required
to build your software, that is specific to your target. It's not enough to
have your host's libraries installed.
Finally, not all toolchains are the same, and consequently, not every Clang
option will work magically. Some options, like ``--sysroot`` (which
effectively changes the logical root for headers and libraries), assume
all your binaries and libraries are in the same directory, which may not
true when your cross-compiler was installed by the distribution's package
management. So, for each specific case, you may use more than one
option, and in most cases, you'll end up setting include paths (``-I``) and
library paths (``-L``) manually.
To sum up, different toolchains can:
* be host/target specific or more flexible
* be in a single directory, or spread out across your system
* have different sets of libraries and headers by default
* need special options, which your build system won't be able to figure
out by itself
General Cross-Compilation Options in Clang
==========================================
Target Triple
-------------
The basic option is to define the target architecture. For that, use
``-target <triple>``. If you don't specify the target, CPU names won't
match (since Clang assumes the host triple), and the compilation will
go ahead, creating code for the host platform, which will break later
on when assembling or linking.
The triple has the general format ``<arch><sub>-<vendor>-<sys>-<abi>``, where:
* ``arch`` = ``x86_64``, ``i386``, ``arm``, ``thumb``, ``mips``, etc.
* ``sub`` = for ex. on ARM: ``v5``, ``v6m``, ``v7a``, ``v7m``, etc.
* ``vendor`` = ``pc``, ``apple``, ``nvidia``, ``ibm``, etc.
* ``sys`` = ``none``, ``linux``, ``win32``, ``darwin``, ``cuda``, etc.
* ``abi`` = ``eabi``, ``gnu``, ``android``, ``macho``, ``elf``, etc.
The sub-architecture options are available for their own architectures,
of course, so "x86v7a" doesn't make sense. The vendor needs to be
specified only if there's a relevant change, for instance between PC
and Apple. Most of the time it can be omitted (and Unknown)
will be assumed, which sets the defaults for the specified architecture.
The system name is generally the OS (linux, darwin), but could be special
like the bare-metal "none".
When a parameter is not important, it can be omitted, or you can
choose ``unknown`` and the defaults will be used. If you choose a parameter
that Clang doesn't know, like ``blerg``, it'll ignore and assume
``unknown``, which is not always desired, so be careful.
Finally, the ABI option is something that will pick default CPU/FPU,
define the specific behaviour of your code (PCS, extensions),
and also choose the correct library calls, etc.
CPU, FPU, ABI
-------------
Once your target is specified, it's time to pick the hardware you'll
be compiling to. For every architecture, a default set of CPU/FPU/ABI
will be chosen, so you'll almost always have to change it via flags.
Typical flags include:
* ``-mcpu=<cpu-name>``, like x86-64, swift, cortex-a15
* ``-mfpu=<fpu-name>``, like SSE3, NEON, controlling the FP unit available
* ``-mfloat-abi=<fabi>``, like soft, hard, controlling which registers
to use for floating-point
The default is normally the common denominator, so that Clang doesn't
generate code that breaks. But that also means you won't get the best
code for your specific hardware, which may mean orders of magnitude
slower than you expect.
For example, if your target is ``arm-none-eabi``, the default CPU will
be ``arm7tdmi`` using soft float, which is extremely slow on modern cores,
whereas if your triple is ``armv7a-none-eabi``, it'll be Cortex-A8 with
NEON, but still using soft-float, which is much better, but still not
great.
Toolchain Options
-----------------
There are three main options to control access to your cross-compiler:
``--sysroot``, ``-I``, and ``-L``. The two last ones are well known,
but they're particularly important for additional libraries
and headers that are specific to your target.
There are two main ways to have a cross-compiler:
#. When you have extracted your cross-compiler from a zip file into
a directory, you have to use ``--sysroot=<path>``. The path is the
root directory where you have unpacked your file, and Clang will
look for the directories ``bin``, ``lib``, ``include`` in there.
In this case, your setup should be pretty much done (if no
additional headers or libraries are needed), as Clang will find
all binaries it needs (assembler, linker, etc) in there.
#. When you have installed via a package manager (modern Linux
distributions have cross-compiler packages available), make
sure the target triple you set is *also* the prefix of your
cross-compiler toolchain.
In this case, Clang will find the other binaries (assembler,
linker), but not always where the target headers and libraries
are. People add system-specific clues to Clang often, but as
things change, it's more likely that it won't find than the
other way around.
So, here, you'll be a lot safer if you specify the include/library
directories manually (via ``-I`` and ``-L``).
Target-Specific Libraries
=========================
All libraries that you compile as part of your build will be
cross-compiled to your target, and your build system will probably
find them in the right place. But all dependencies that are
normally checked against (like ``libxml`` or ``libz`` etc) will match
against the host platform, not the target.
So, if the build system is not aware that you want to cross-compile
your code, it will get every dependency wrong, and your compilation
will fail during build time, not configure time.
Also, finding the libraries for your target are not as easy
as for your host machine. There aren't many cross-libraries available
as packages to most OS's, so you'll have to either cross-compile them
from source, or download the package for your target platform,
extract the libraries and headers, put them in specific directories
and add ``-I`` and ``-L`` pointing to them.
Also, some libraries have different dependencies on different targets,
so configuration tools to find dependencies in the host can get the
list wrong for the target platform. This means that the configuration
of your build can get things wrong when setting their own library
paths, and you'll have to augment it via additional flags (configure,
Make, CMake, etc).
Multilibs
---------
When you want to cross-compile to more than one configuration, for
example hard-float-ARM and soft-float-ARM, you'll have to have multiple
copies of your libraries and (possibly) headers.
Some Linux distributions have support for Multilib, which handle that
for you in an easier way, but if you're not careful and, for instance,
forget to specify ``-ccc-gcc-name armv7l-linux-gnueabihf-gcc`` (which
uses hard-float), Clang will pick the ``armv7l-linux-gnueabi-ld``
(which uses soft-float) and linker errors will happen.
The same is true if you're compiling for different ABIs, like ``gnueabi``
and ``androideabi``, and might even link and run, but produce run-time
errors, which are much harder to track down and fix.

View File

@ -0,0 +1,158 @@
=================
DataFlowSanitizer
=================
.. toctree::
:hidden:
DataFlowSanitizerDesign
.. contents::
:local:
Introduction
============
DataFlowSanitizer is a generalised dynamic data flow analysis.
Unlike other Sanitizer tools, this tool is not designed to detect a
specific class of bugs on its own. Instead, it provides a generic
dynamic data flow analysis framework to be used by clients to help
detect application-specific issues within their own code.
Usage
=====
With no program changes, applying DataFlowSanitizer to a program
will not alter its behavior. To use DataFlowSanitizer, the program
uses API functions to apply tags to data to cause it to be tracked, and to
check the tag of a specific data item. DataFlowSanitizer manages
the propagation of tags through the program according to its data flow.
The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
For further information about each function, please refer to the header
file.
ABI List
--------
DataFlowSanitizer uses a list of functions known as an ABI list to decide
whether a call to a specific function should use the operating system's native
ABI or whether it should use a variant of this ABI that also propagates labels
through function parameters and return values. The ABI list file also controls
how labels are propagated in the former case. DataFlowSanitizer comes with a
default ABI list which is intended to eventually cover the glibc library on
Linux but it may become necessary for users to extend the ABI list in cases
where a particular library or function cannot be instrumented (e.g. because
it is implemented in assembly or another language which DataFlowSanitizer does
not support) or a function is called from a library or function which cannot
be instrumented.
DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
The pass treats every function in the ``uninstrumented`` category in the
ABI list file as conforming to the native ABI. Unless the ABI list contains
additional categories for those functions, a call to one of those functions
will produce a warning message, as the labelling behavior of the function
is unknown. The other supported categories are ``discard``, ``functional``
and ``custom``.
* ``discard`` -- To the extent that this function writes to (user-accessible)
memory, it also updates labels in shadow memory (this condition is trivially
satisfied for functions which do not write to user-accessible memory). Its
return value is unlabelled.
* ``functional`` -- Like ``discard``, except that the label of its return value
is the union of the label of its arguments.
* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
is called, where ``F`` is the name of the function. This function may wrap
the original function or provide its own implementation. This category is
generally used for uninstrumentable functions which write to user-accessible
memory or which have more complex label propagation behavior. The signature
of ``__dfsw_F`` is based on that of ``F`` with each argument having a
label of type ``dfsan_label`` appended to the argument list. If ``F``
is of non-void return type a final argument of type ``dfsan_label *``
is appended to which the custom function can store the label for the
return value. For example:
.. code-block:: c++
void f(int x);
void __dfsw_f(int x, dfsan_label x_label);
void *memcpy(void *dest, const void *src, size_t n);
void *__dfsw_memcpy(void *dest, const void *src, size_t n,
dfsan_label dest_label, dfsan_label src_label,
dfsan_label n_label, dfsan_label *ret_label);
If a function defined in the translation unit being compiled belongs to the
``uninstrumented`` category, it will be compiled so as to conform to the
native ABI. Its arguments will be assumed to be unlabelled, but it will
propagate labels in shadow memory.
For example:
.. code-block:: none
# main is called by the C runtime using the native ABI.
fun:main=uninstrumented
fun:main=discard
# malloc only writes to its internal data structures, not user-accessible memory.
fun:malloc=uninstrumented
fun:malloc=discard
# tolower is a pure function.
fun:tolower=uninstrumented
fun:tolower=functional
# memcpy needs to copy the shadow from the source to the destination region.
# This is done in a custom function.
fun:memcpy=uninstrumented
fun:memcpy=custom
Example
=======
The following program demonstrates label propagation by checking that
the correct labels are propagated.
.. code-block:: c++
#include <sanitizer/dfsan_interface.h>
#include <assert.h>
int main(void) {
int i = 1;
dfsan_label i_label = dfsan_create_label("i", 0);
dfsan_set_label(i_label, &i, sizeof(i));
int j = 2;
dfsan_label j_label = dfsan_create_label("j", 0);
dfsan_set_label(j_label, &j, sizeof(j));
int k = 3;
dfsan_label k_label = dfsan_create_label("k", 0);
dfsan_set_label(k_label, &k, sizeof(k));
dfsan_label ij_label = dfsan_get_label(i + j);
assert(dfsan_has_label(ij_label, i_label));
assert(dfsan_has_label(ij_label, j_label));
assert(!dfsan_has_label(ij_label, k_label));
dfsan_label ijk_label = dfsan_get_label(i + j + k);
assert(dfsan_has_label(ijk_label, i_label));
assert(dfsan_has_label(ijk_label, j_label));
assert(dfsan_has_label(ijk_label, k_label));
return 0;
}
Current status
==============
DataFlowSanitizer is a work in progress, currently under development for
x86\_64 Linux.
Design
======
Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.

View File

@ -0,0 +1,220 @@
DataFlowSanitizer Design Document
=================================
This document sets out the design for DataFlowSanitizer, a general
dynamic data flow analysis. Unlike other Sanitizer tools, this tool is
not designed to detect a specific class of bugs on its own. Instead,
it provides a generic dynamic data flow analysis framework to be used
by clients to help detect application-specific issues within their
own code.
DataFlowSanitizer is a program instrumentation which can associate
a number of taint labels with any data stored in any memory region
accessible by the program. The analysis is dynamic, which means that
it operates on a running program, and tracks how the labels propagate
through that program. The tool shall support a large (>100) number
of labels, such that programs which operate on large numbers of data
items may be analysed with each data item being tracked separately.
Use Cases
---------
This instrumentation can be used as a tool to help monitor how data
flows from a program's inputs (sources) to its outputs (sinks).
This has applications from a privacy/security perspective in that
one can audit how a sensitive data item is used within a program and
ensure it isn't exiting the program anywhere it shouldn't be.
Interface
---------
A number of functions are provided which will create taint labels,
attach labels to memory regions and extract the set of labels
associated with a specific memory region. These functions are declared
in the header file ``sanitizer/dfsan_interface.h``.
.. code-block:: c
/// Creates and returns a base label with the given description and user data.
dfsan_label dfsan_create_label(const char *desc, void *userdata);
/// Sets the label for each address in [addr,addr+size) to \c label.
void dfsan_set_label(dfsan_label label, void *addr, size_t size);
/// Sets the label for each address in [addr,addr+size) to the union of the
/// current label for that address and \c label.
void dfsan_add_label(dfsan_label label, void *addr, size_t size);
/// Retrieves the label associated with the given data.
///
/// The type of 'data' is arbitrary. The function accepts a value of any type,
/// which can be truncated or extended (implicitly or explicitly) as necessary.
/// The truncation/extension operations will preserve the label of the original
/// value.
dfsan_label dfsan_get_label(long data);
/// Retrieves a pointer to the dfsan_label_info struct for the given label.
const struct dfsan_label_info *dfsan_get_label_info(dfsan_label label);
/// Returns whether the given label label contains the label elem.
int dfsan_has_label(dfsan_label label, dfsan_label elem);
/// If the given label label contains a label with the description desc, returns
/// that label, else returns 0.
dfsan_label dfsan_has_label_with_desc(dfsan_label label, const char *desc);
Taint label representation
--------------------------
As stated above, the tool must track a large number of taint
labels. This poses an implementation challenge, as most multiple-label
tainting systems assign one label per bit to shadow storage, and
union taint labels using a bitwise or operation. This will not scale
to clients which use hundreds or thousands of taint labels, as the
label union operation becomes O(n) in the number of supported labels,
and data associated with it will quickly dominate the live variable
set, causing register spills and hampering performance.
Instead, a low overhead approach is proposed which is best-case O(log\
:sub:`2` n) during execution. The underlying assumption is that
the required space of label unions is sparse, which is a reasonable
assumption to make given that we are optimizing for the case where
applications mostly copy data from one place to another, without often
invoking the need for an actual union operation. The representation
of a taint label is a 16-bit integer, and new labels are allocated
sequentially from a pool. The label identifier 0 is special, and means
that the data item is unlabelled.
When a label union operation is requested at a join point (any
arithmetic or logical operation with two or more operands, such as
addition), the code checks whether a union is required, whether the
same union has been requested before, and whether one union label
subsumes the other. If so, it returns the previously allocated union
label. If not, it allocates a new union label from the same pool used
for new labels.
Specifically, the instrumentation pass will insert code like this
to decide the union label ``lu`` for a pair of labels ``l1``
and ``l2``:
.. code-block:: c
if (l1 == l2)
lu = l1;
else
lu = __dfsan_union(l1, l2);
The equality comparison is outlined, to provide an early exit in
the common cases where the program is processing unlabelled data, or
where the two data items have the same label. ``__dfsan_union`` is
a runtime library function which performs all other union computation.
Further optimizations are possible, for example if ``l1`` is known
at compile time to be zero (e.g. it is derived from a constant),
``l2`` can be used for ``lu``, and vice versa.
Memory layout and label management
----------------------------------
The following is the current memory layout for Linux/x86\_64:
+---------------+---------------+--------------------+
| Start | End | Use |
+===============+===============+====================+
| 0x700000008000|0x800000000000 | application memory |
+---------------+---------------+--------------------+
| 0x200200000000|0x700000008000 | unused |
+---------------+---------------+--------------------+
| 0x200000000000|0x200200000000 | union table |
+---------------+---------------+--------------------+
| 0x000000010000|0x200000000000 | shadow memory |
+---------------+---------------+--------------------+
| 0x000000000000|0x000000010000 | reserved by kernel |
+---------------+---------------+--------------------+
Each byte of application memory corresponds to two bytes of shadow
memory, which are used to store its taint label. As for LLVM SSA
registers, we have not found it necessary to associate a label with
each byte or bit of data, as some other tools do. Instead, labels are
associated directly with registers. Loads will result in a union of
all shadow labels corresponding to bytes loaded (which most of the
time will be short circuited by the initial comparison) and stores will
result in a copy of the label to the shadow of all bytes stored to.
Propagating labels through arguments
------------------------------------
In order to propagate labels through function arguments and return values,
DataFlowSanitizer changes the ABI of each function in the translation unit.
There are currently two supported ABIs:
* Args -- Argument and return value labels are passed through additional
arguments and by modifying the return type.
* TLS -- Argument and return value labels are passed through TLS variables
``__dfsan_arg_tls`` and ``__dfsan_retval_tls``.
The main advantage of the TLS ABI is that it is more tolerant of ABI mismatches
(TLS storage is not shared with any other form of storage, whereas extra
arguments may be stored in registers which under the native ABI are not used
for parameter passing and thus could contain arbitrary values). On the other
hand the args ABI is more efficient and allows ABI mismatches to be more easily
identified by checking for nonzero labels in nominally unlabelled programs.
Implementing the ABI list
-------------------------
The `ABI list <DataFlowSanitizer.html#abi-list>`_ provides a list of functions
which conform to the native ABI, each of which is callable from an instrumented
program. This is implemented by replacing each reference to a native ABI
function with a reference to a function which uses the instrumented ABI.
Such functions are automatically-generated wrappers for the native functions.
For example, given the ABI list example provided in the user manual, the
following wrappers will be generated under the args ABI:
.. code-block:: llvm
define linkonce_odr { i8*, i16 } @"dfsw$malloc"(i64 %0, i16 %1) {
entry:
%2 = call i8* @malloc(i64 %0)
%3 = insertvalue { i8*, i16 } undef, i8* %2, 0
%4 = insertvalue { i8*, i16 } %3, i16 0, 1
ret { i8*, i16 } %4
}
define linkonce_odr { i32, i16 } @"dfsw$tolower"(i32 %0, i16 %1) {
entry:
%2 = call i32 @tolower(i32 %0)
%3 = insertvalue { i32, i16 } undef, i32 %2, 0
%4 = insertvalue { i32, i16 } %3, i16 %1, 1
ret { i32, i16 } %4
}
define linkonce_odr { i8*, i16 } @"dfsw$memcpy"(i8* %0, i8* %1, i64 %2, i16 %3, i16 %4, i16 %5) {
entry:
%labelreturn = alloca i16
%6 = call i8* @__dfsw_memcpy(i8* %0, i8* %1, i64 %2, i16 %3, i16 %4, i16 %5, i16* %labelreturn)
%7 = load i16* %labelreturn
%8 = insertvalue { i8*, i16 } undef, i8* %6, 0
%9 = insertvalue { i8*, i16 } %8, i16 %7, 1
ret { i8*, i16 } %9
}
As an optimization, direct calls to native ABI functions will call the
native ABI function directly and the pass will compute the appropriate label
internally. This has the advantage of reducing the number of union operations
required when the return value label is known to be zero (i.e. ``discard``
functions, or ``functional`` functions with known unlabelled arguments).
Checking ABI Consistency
------------------------
DFSan changes the ABI of each function in the module. This makes it possible
for a function with the native ABI to be called with the instrumented ABI,
or vice versa, thus possibly invoking undefined behavior. A simple way
of statically detecting instances of this problem is to prepend the prefix
"dfs$" to the name of each instrumented-ABI function.
This will not catch every such problem; in particular function pointers passed
across the instrumented-native barrier cannot be used on the other side.
These problems could potentially be caught dynamically.

Some files were not shown because too many files have changed in this diff Show More