Imported Upstream version 6.10.0.49

Former-commit-id: 1d6753294b2993e1fbf92de9366bb9544db4189b
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2020-01-16 16:38:04 +00:00
parent d94e79959b
commit 468663ddbb
48518 changed files with 2789335 additions and 61176 deletions

View File

@@ -0,0 +1,267 @@
============
Debug Checks
============
.. contents::
:local:
The analyzer contains a number of checkers which can aid in debugging. Enable
them by using the "-analyzer-checker=" flag, followed by the name of the
checker.
General Analysis Dumpers
========================
These checkers are used to dump the results of various infrastructural analyses
to stderr. Some checkers also have "view" variants, which will display a graph
using a 'dot' format viewer (such as Graphviz on OS X) instead.
- debug.DumpCallGraph, debug.ViewCallGraph: Show the call graph generated for
the current translation unit. This is used to determine the order in which to
analyze functions when inlining is enabled.
- debug.DumpCFG, debug.ViewCFG: Show the CFG generated for each top-level
function being analyzed.
- debug.DumpDominators: Shows the dominance tree for the CFG of each top-level
function.
- debug.DumpLiveVars: Show the results of live variable analysis for each
top-level function being analyzed.
- debug.ViewExplodedGraph: Show the Exploded Graphs generated for the
analysis of different functions in the input translation unit. When there
are several functions analyzed, display one graph per function. Beware
that these graphs may grow very large, even for small functions.
Path Tracking
=============
These checkers print information about the path taken by the analyzer engine.
- debug.DumpCalls: Prints out every function or method call encountered during a
path traversal. This is indented to show the call stack, but does NOT do any
special handling of branches, meaning different paths could end up
interleaved.
- debug.DumpTraversal: Prints the name of each branch statement encountered
during a path traversal ("IfStmt", "WhileStmt", etc). Currently used to check
whether the analysis engine is doing BFS or DFS.
State Checking
==============
These checkers will print out information about the analyzer state in the form
of analysis warnings. They are intended for use with the -verify functionality
in regression tests.
- debug.TaintTest: Prints out the word "tainted" for every expression that
carries taint. At the time of this writing, taint was only introduced by the
checks under experimental.security.taint.TaintPropagation; this checker may
eventually move to the security.taint package.
- debug.ExprInspection: Responds to certain function calls, which are modeled
after builtins. These function calls should affect the program state other
than the evaluation of their arguments; to use them, you will need to declare
them within your test file. The available functions are described below.
(FIXME: debug.ExprInspection should probably be renamed, since it no longer only
inspects expressions.)
ExprInspection checks
---------------------
- ``void clang_analyzer_eval(bool);``
Prints TRUE if the argument is known to have a non-zero value, FALSE if the
argument is known to have a zero or null value, and UNKNOWN if the argument
isn't sufficiently constrained on this path. You can use this to test other
values by using expressions like "x == 5". Note that this functionality is
currently DISABLED in inlined functions, since different calls to the same
inlined function could provide different information, making it difficult to
write proper -verify directives.
In C, the argument can be typed as 'int' or as '_Bool'.
Example usage::
clang_analyzer_eval(x); // expected-warning{{UNKNOWN}}
if (!x) return;
clang_analyzer_eval(x); // expected-warning{{TRUE}}
- ``void clang_analyzer_checkInlined(bool);``
If a call occurs within an inlined function, prints TRUE or FALSE according to
the value of its argument. If a call occurs outside an inlined function,
nothing is printed.
The intended use of this checker is to assert that a function is inlined at
least once (by passing 'true' and expecting a warning), or to assert that a
function is never inlined (by passing 'false' and expecting no warning). The
argument is technically unnecessary but is intended to clarify intent.
You might wonder why we can't print TRUE if a function is ever inlined and
FALSE if it is not. The problem is that any inlined function could conceivably
also be analyzed as a top-level function (in which case both TRUE and FALSE
would be printed), depending on the value of the -analyzer-inlining option.
In C, the argument can be typed as 'int' or as '_Bool'.
Example usage::
int inlined() {
clang_analyzer_checkInlined(true); // expected-warning{{TRUE}}
return 42;
}
void topLevel() {
clang_analyzer_checkInlined(false); // no-warning (not inlined)
int value = inlined();
// This assertion will not be valid if the previous call was not inlined.
clang_analyzer_eval(value == 42); // expected-warning{{TRUE}}
}
- ``void clang_analyzer_warnIfReached();``
Generate a warning if this line of code gets reached by the analyzer.
Example usage::
if (true) {
clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
}
else {
clang_analyzer_warnIfReached(); // no-warning
}
- ``void clang_analyzer_numTimesReached();``
Same as above, but include the number of times this call expression
gets reached by the analyzer during the current analysis.
Example usage::
for (int x = 0; x < 3; ++x) {
clang_analyzer_numTimesReached(); // expected-warning{{3}}
}
- ``void clang_analyzer_warnOnDeadSymbol(int);``
Subscribe for a delayed warning when the symbol that represents the value of
the argument is garbage-collected by the analyzer.
When calling 'clang_analyzer_warnOnDeadSymbol(x)', if value of 'x' is a
symbol, then this symbol is marked by the ExprInspection checker. Then,
during each garbage collection run, the checker sees if the marked symbol is
being collected and issues the 'SYMBOL DEAD' warning if it does.
This way you know where exactly, up to the line of code, the symbol dies.
It is unlikely that you call this function after the symbol is already dead,
because the very reference to it as the function argument prevents it from
dying. However, if the argument is not a symbol but a concrete value,
no warning would be issued.
Example usage::
do {
int x = generate_some_integer();
clang_analyzer_warnOnDeadSymbol(x);
} while(0); // expected-warning{{SYMBOL DEAD}}
- ``void clang_analyzer_explain(a single argument of any type);``
This function explains the value of its argument in a human-readable manner
in the warning message. You can make as many overrides of its prototype
in the test code as necessary to explain various integral, pointer,
or even record-type values. To simplify usage in C code (where overloading
the function declaration is not allowed), you may append an arbitrary suffix
to the function name, without affecting functionality.
Example usage::
void clang_analyzer_explain(int);
void clang_analyzer_explain(void *);
// Useful in C code
void clang_analyzer_explain_int(int);
void foo(int param, void *ptr) {
clang_analyzer_explain(param); // expected-warning{{argument 'param'}}
clang_analyzer_explain_int(param); // expected-warning{{argument 'param'}}
if (!ptr)
clang_analyzer_explain(ptr); // expected-warning{{memory address '0'}}
}
- ``void clang_analyzer_dump( /* a single argument of any type */);``
Similar to clang_analyzer_explain, but produces a raw dump of the value,
same as SVal::dump().
Example usage::
void clang_analyzer_dump(int);
void foo(int x) {
clang_analyzer_dump(x); // expected-warning{{reg_$0<x>}}
}
- ``size_t clang_analyzer_getExtent(void *);``
This function returns the value that represents the extent of a memory region
pointed to by the argument. This value is often difficult to obtain otherwise,
because no valid code that produces this value. However, it may be useful
for testing purposes, to see how well does the analyzer model region extents.
Example usage::
void foo() {
int x, *y;
size_t xs = clang_analyzer_getExtent(&x);
clang_analyzer_explain(xs); // expected-warning{{'4'}}
size_t ys = clang_analyzer_getExtent(&y);
clang_analyzer_explain(ys); // expected-warning{{'8'}}
}
- ``void clang_analyzer_printState();``
Dumps the current ProgramState to the stderr. Quickly lookup the program state
at any execution point without ViewExplodedGraph or re-compiling the program.
This is not very useful for writing tests (apart from testing how ProgramState
gets printed), but useful for debugging tests. Also, this method doesn't
produce a warning, so it gets printed on the console before all other
ExprInspection warnings.
Example usage::
void foo() {
int x = 1;
clang_analyzer_printState(); // Read the stderr!
}
- ``void clang_analyzer_hashDump(int);``
The analyzer can generate a hash to identify reports. To debug what information
is used to calculate this hash it is possible to dump the hashed string as a
warning of an arbitrary expression using the function above.
Example usage::
void foo() {
int x = 1;
clang_analyzer_hashDump(x); // expected-warning{{hashed string for x}}
}
Statistics
==========
The debug.Stats checker collects various information about the analysis of each
function, such as how many blocks were reached and if the analyzer timed out.
There is also an additional -analyzer-stats flag, which enables various
statistics within the analyzer engine. Note the Stats checker (which produces at
least one bug report per function) may actually change the values reported by
-analyzer-stats.

View File

@@ -0,0 +1,321 @@
This discussion took place in https://reviews.llvm.org/D35216
"Escape symbols when creating std::initializer_list".
It touches problems of modelling C++ standard library constructs in general,
including modelling implementation-defined fields within C++ standard library
objects, in particular constructing objects into pointers held by such fields,
and separation of responsibilities between analyzer's core and checkers.
**Artem:**
I've seen a few false positives that appear because we construct
C++11 std::initializer_list objects with brace initializers, and such
construction is not properly modeled. For instance, if a new object is
constructed on the heap only to be put into a brace-initialized STL container,
the object is reported to be leaked.
Approach (0): This can be trivially fixed by this patch, which causes pointers
passed into initializer list expressions to immediately escape.
This fix is overly conservative though. So i did a bit of investigation as to
how model std::initializer_list better.
According to the standard, std::initializer_list<T> is an object that has
methods begin(), end(), and size(), where begin() returns a pointer to continous
array of size() objects of type T, and end() is equal to begin() plus size().
The standard does hint that it should be possible to implement
std::initializer_list<T> as a pair of pointers, or as a pointer and a size
integer, however specific fields that the object would contain are an
implementation detail.
Ideally, we should be able to model the initializer list's methods precisely.
Or, at least, it should be possible to explain to the analyzer that the list
somehow "takes hold" of the values put into it. Initializer lists can also be
copied, which is a separate story that i'm not trying to address here.
The obvious approach to modeling std::initializer_list in a checker would be to
construct a SymbolMetadata for the memory region of the initializer list object,
which would be of type T* and represent begin(), so we'd trivially model begin()
as a function that returns this symbol. The array pointed to by that symbol
would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
to produce less bindings in the store). Extent of this array would represent
size() and would be equal to the length of the list as written.
So this sounds good, however apparently it does nothing to address our false
positives: when the list escapes, our RegionStoreManager is not magically
guessing that the metadata symbol attached to it, together with its contents,
should also escape. In fact, it's impossible to trigger a pointer escape from
within the checker.
Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
to cause pointer escapes (not only region changes) (which sounds like the right
thing to do anyway) such checker would be able to solve the false positives by
triggering escapes when binding list elements to the list. However, it'd be as
conservative as the current patch's solution. Ideally, we do not want escapes to
happen so early. Instead, we'd prefer them to be delayed until the list itself
escapes.
So i believe that escaping metadata symbols whenever their base regions escape
would be the right thing to do. Currently we didn't think about that because we
had neither pointer-type metadatas nor non-pointer escapes.
Approach (2): We could teach the Store to scan itself for bindings to
metadata-symbolic-based regions during scanReachableSymbols() whenever a region
turns out to be reachable. This requires no work on checker side, but it sounds
performance-heavy.
Approach (3): We could let checkers maintain the set of active metadata symbols
in the program state (ideally somewhere in the Store, which sounds weird but
causes the smallest amount of layering violations), so that the core knew what
to escape. This puts a stress on the checkers, but with a smart data map it
wouldn't be a problem.
Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
moments. If we allow doing this within checkPointerEscape callback itself, we
would be able to express facts like "when this region escapes, that metadata
symbol attached to it should also escape". This sounds like an ultimate freedom,
with maximum stress on the checkers - still not too much stress when we have
smart data maps.
I'm personally liking the approach (2) - it should be possible to avoid
performance overhead, and clarity seems nice.
**Gabor:**
At this point, I am a bit wondering about two questions.
- When should something belong to a checker and when should something belong
to the engine? Sometimes we model library aspects in the engine and model
language constructs in checkers.
- What is the checker programming model that we are aiming for? Maximum
freedom or more easy checker development?
I think if we aim for maximum freedom, we do not need to worry about the
potential stress on checkers, and we can introduce abstractions to mitigate that
later on.
If we want to simplify the API, then maybe it makes more sense to move language
construct modeling to the engine when the checker API is not sufficient instead
of complicating the API.
Right now I have no preference or objections between the alternatives but there
are some random thoughts:
- Maybe it would be great to have a guideline how to evolve the analyzer and
follow it, so it can help us to decide in similar situations
- I do care about performance in this case. The reason is that we have a
limited performance budget. And I think we should not expect most of the checker
writers to add modeling of language constructs. So, in my opinion, it is ok to
have less nice/more verbose API for language modeling if we can have better
performance this way, since it only needs to be done once, and is done by the
framework developers.
**Artem:** These are some great questions, i guess it'd be better to discuss
them more openly. As a quick dump of my current mood:
- To me it seems obvious that we need to aim for a checker API that is both
simple and powerful. This can probably by keeping the API as powerful as
necessary while providing a layer of simple ready-made solutions on top of it.
Probably a few reusable components for assembling checkers. And this layer
should ideally be pleasant enough to work with, so that people would prefer to
extend it when something is lacking, instead of falling back to the complex
omnipotent API. I'm thinking of AST matchers vs. AST visitors as a roughly
similar situation: matchers are not omnipotent, but they're so nice.
- Separation between core and checkers is usually quite strange. Once we have
shared state traits, i generally wouldn't mind having region store or range
constraint manager as checkers (though it's probably not worth it to transform
them - just a mood). The main thing to avoid here would be the situation when
the checker overwrites stuff written by the core because it thinks it has a
better idea what's going on, so the core should provide a good default behavior.
- Yeah, i totally care about performance as well, and if i try to implement
approach, i'd make sure it's good.
**Artem:**
> Approach (2): We could teach the Store to scan itself for bindings to
> metadata-symbolic-based regions during scanReachableSymbols() whenever
> a region turns out to be reachable. This requires no work on checker side,
> but it sounds performance-heavy.
Nope, this approach is wrong. Metadata symbols may become out-of-date: when the
object changes, metadata symbols attached to it aren't changing (because symbols
simply don't change). The same metadata may have different symbols to denote its
value in different moments of time, but at most one of them represents the
actual metadata value. So we'd be escaping more stuff than necessary.
If only we had "ghost fields"
(http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), it would have
been much easier, because the ghost field would only contain the actual
metadata, and the Store would always know about it. This example adds to my
belief that ghost fields are exactly what we need for most C++ checkers.
**Devin:**
In this case, I would be fine with some sort of
AbstractStorageMemoryRegion that meant "here is a memory region and somewhere
reachable from here exists another region of type T". Or even multiple regions
with different identifiers. This wouldn't specify how the memory is reachable,
but it would allow for transfer functions to get at those regions and it would
allow for invalidation.
For std::initializer_list this reachable region would the region for the backing
array and the transfer functions for begin() and end() yield the beginning and
end element regions for it.
In my view this differs from ghost variables in that (1) this storage does
actually exist (it is just a library implementation detail where that storage
lives) and (2) it is perfectly valid for a pointer into that storage to be
returned and for another part of the program to read or write from that storage.
(Well, in this case just read since it is allowed to be read-only memory).
What I'm not OK with is modeling abstract analysis state (for example, the count
of a NSMutableArray or the typestate of a file handle) as a value stored in some
ginned up region in the store. This takes an easy problem that the analyzer does
well at (modeling typestate) and turns it into a hard one that the analyzer is
bad at (reasoning about the contents of the heap).
I think the key criterion here is: "is the region accessible from outside the
library". That is, does the library expose the region as a pointer that can be
read to or written from in the client program? If so, then it makes sense for
this to be in the store: we are modeling reachable storage as storage. But if
we're just modeling arbitrary analysis facts that need to be invalidated when a
pointer escapes then we shouldn't try to gin up storage for them just to get
invalidation for free.
**Artem:**
> In this case, I would be fine with some sort of AbstractStorageMemoryRegion
> that meant "here is a memory region and somewhere reachable from here exists
> another region of type T". Or even multiple regions with different
> identifiers. This wouldn't specify how the memory is reachable, but it would
> allow for transfer functions to get at those regions and it would allow for
> invalidation.
Yeah, this is what we can easily implement now as a
symbolic-region-based-on-a-metadata-symbol (though we can make a new region
class for that if we eg. want it typed). The problem is that the relation
between such storage region and its parent object region is essentially
immaterial, similarly to the relation between SymbolRegionValue and its parent
region. Region contents are mutable: today the abstract storage is reachable
from its parent object, tomorrow it's not, and maybe something else becomes
reachable, something that isn't even abstract. So the parent region for the
abstract storage is most of the time at best a "nice to know" thing - we cannot
rely on it to do any actual work. We'd anyway need to rely on the checker to do
the job.
> For std::initializer_list this reachable region would the region for the
> backing array and the transfer functions for begin() and end() yield the
> beginning and end element regions for it.
So maybe in fact for std::initializer_list it may work fine because you cannot
change the data after the object is constructed - so this region's contents are
essentially immutable. For the future, i feel as if it is a dead end.
I'd like to consider another funny example. Suppose we're trying to model
std::unique_ptr. Consider::
void bar(const std::unique_ptr<int> &x);
void foo(std::unique_ptr<int> &x) {
int *a = x.get(); // (a, 0, direct): &AbstractStorageRegion
*a = 1; // (AbstractStorageRegion, 0, direct): 1 S32b
int *b = new int;
*b = 2; // (SymRegion{conj_$0<int *>}, 0 ,direct): 2 S32b
x.reset(b); // Checker map: x -> SymRegion{conj_$0<int *>}
bar(x); // 'a' doesn't escape (the pointer was unique), 'b' does.
clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
}
The checker doesn't totally need to ensure that *a == 1 passes - even though the
pointer was unique, it could theoretically have .get()-ed above and the code
could of course break the uniqueness invariant (though we'd probably want it).
The checker can say that "even if *a did escape, it was not because it was
stuffed directly into bar()".
The checker's direct responsibility, however, is to solve the *b == 2 thing
(which is in fact the problem we're dealing with in this patch - escaping the
storage region of the object).
So we're talking about one more operation over the program state (scanning
reachable symbols and regions) that cannot work without checker support.
We can probably add a new callback "checkReachableSymbols" to solve this. This
is in fact also related to the dead symbols problem (we're scanning for live
symbols in the store and in the checkers separately, but we need to do so
simultaneously with a single worklist). Hmm, in fact this sounds like a good
idea; we can replace checkLiveSymbols with checkReachableSymbols.
Or we could just have ghost member variables, and no checker support required at
all. For ghost member variables, the relation with their parent region (which
would be their superregion) is actually useful, the mutability of their contents
is expressed naturally, and the store automagically sees reachable symbols, live
symbols, escapes, invalidations, whatever.
> In my view this differs from ghost variables in that (1) this storage does
> actually exist (it is just a library implementation detail where that storage
> lives) and (2) it is perfectly valid for a pointer into that storage to be
> returned and for another part of the program to read or write from that
> storage. (Well, in this case just read since it is allowed to be read-only
> memory).
> What I'm not OK with is modeling abstract analysis state (for example, the
> count of a NSMutableArray or the typestate of a file handle) as a value stored
> in some ginned up region in the store.This takes an easy problem that the
> analyzer does well at (modeling typestate) and turns it into a hard one that
> the analyzer is bad at (reasoning about the contents of the heap).
Yeah, i tend to agree on that. For simple typestates, this is probably an
overkill, so let's definitely put aside the idea of "ghost symbolic regions"
that i had earlier.
But, to summarize a bit, in our current case, however, the typestate we're
looking for is the contents of the heap. And when we try to model such
typestates (complex in this specific manner, i.e. heap-like) in any checker, we
have a choice between re-doing this modeling in every such checker (which is
something analyzer is indeed good at, but at a price of making checkers heavy)
or instead relying on the Store to do exactly what it's designed to do.
> I think the key criterion here is: "is the region accessible from outside
> the library". That is, does the library expose the region as a pointer that
> can be read to or written from in the client program? If so, then it makes
> sense for this to be in the store: we are modeling reachable storage as
> storage. But if we're just modeling arbitrary analysis facts that need to be
> invalidated when a pointer escapes then we shouldn't try to gin up storage
> for them just to get invalidation for free.
As a metaphor, i'd probably compare it to body farms - the difference between
ghost member variables and metadata symbols seems to me like the difference
between body farms and evalCall. Both are nice to have, and body farms are very
pleasant to work with, even if not omnipotent. I think it's fine for a
FunctionDecl's body in a body farm to have a local variable, even if such
variable doesn't actually exist, even if it cannot be seen from outside the
function call. I'm not seeing immediate practical difference between "it does
actually exist" and "it doesn't actually exist, just a handy abstraction".
Similarly, i think it's fine if we have a CXXRecordDecl with
implementation-defined contents, and try to farm up a member variable as a handy
abstraction (we don't even need to know its name or offset, only that it's there
somewhere).
**Artem:**
We've discussed it in person with Devin, and he provided more points to think
about:
- If the initializer list consists of non-POD data, constructors of list's
objects need to take the sub-region of the list's region as this-region In the
current (v2) version of this patch, these objects are constructed elsewhere and
then trivial-copied into the list's metadata pointer region, which may be
incorrect. This is our overall problem with C++ constructors, which manifests in
this case as well. Additionally, objects would need to be constructed in the
analyzer's core, which would not be able to predict that it needs to take a
checker-specific region as this-region, which makes it harder, though it might
be mitigated by sharing the checker state traits.
- Because "ghost variables" are not material to the user, we need to somehow
make super sure that they don't make it into the diagnostic messages.
So, because this needs further digging into overall C++ support and rises too
many questions, i'm delaying a better approach to this problem and will fall
back to the original trivial patch.

View File

@@ -0,0 +1,386 @@
Inlining
========
There are several options that control which calls the analyzer will consider for
inlining. The major one is -analyzer-config ipa:
-analyzer-config ipa=none - All inlining is disabled. This is the only mode
available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
-analyzer-config ipa=basic-inlining - Turns on inlining for C functions, C++
static member functions, and blocks -- essentially, the calls that behave
like simple C function calls. This is essentially the mode used in
Xcode 4.4.
-analyzer-config ipa=inlining - Turns on inlining when we can confidently find
the function/method body corresponding to the call. (C functions, static
functions, devirtualized C++ methods, Objective-C class methods, Objective-C
instance methods when ExprEngine is confident about the dynamic type of the
instance).
-analyzer-config ipa=dynamic - Inline instance methods for which the type is
determined at runtime and we are not 100% sure that our type info is
correct. For virtual calls, inline the most plausible definition.
-analyzer-config ipa=dynamic-bifurcate - Same as -analyzer-config ipa=dynamic,
but the path is split. We inline on one branch and do not inline on the
other. This mode does not drop the coverage in cases when the parent class
has code that is only exercised when some of its methods are overridden.
Currently, -analyzer-config ipa=dynamic-bifurcate is the default mode.
While -analyzer-config ipa determines in general how aggressively the analyzer
will try to inline functions, several additional options control which types of
functions can inlined, in an all-or-nothing way. These options use the
analyzer's configuration table, so they are all specified as follows:
-analyzer-config OPTION=VALUE
### c++-inlining ###
This option controls which C++ member functions may be inlined.
-analyzer-config c++-inlining=[none | methods | constructors | destructors]
Each of these modes implies that all the previous member function kinds will be
inlined as well; it doesn't make sense to inline destructors without inlining
constructors, for example.
The default c++-inlining mode is 'destructors', meaning that all member
functions with visible definitions will be considered for inlining. In some
cases the analyzer may still choose not to inline the function.
Note that under 'constructors', constructors for types with non-trivial
destructors will not be inlined. Additionally, no C++ member functions will be
inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining,
regardless of the setting of the c++-inlining mode.
### c++-template-inlining ###
This option controls whether C++ templated functions may be inlined.
-analyzer-config c++-template-inlining=[true | false]
Currently, template functions are considered for inlining by default.
The motivation behind this option is that very generic code can be a source
of false positives, either by considering paths that the caller considers
impossible (by some unstated precondition), or by inlining some but not all
of a deep implementation of a function.
### c++-stdlib-inlining ###
This option controls whether functions from the C++ standard library, including
methods of the container classes in the Standard Template Library, should be
considered for inlining.
-analyzer-config c++-stdlib-inlining=[true | false]
Currently, C++ standard library functions are considered for inlining by
default.
The standard library functions and the STL in particular are used ubiquitously
enough that our tolerance for false positives is even lower here. A false
positive due to poor modeling of the STL leads to a poor user experience, since
most users would not be comfortable adding assertions to system headers in order
to silence analyzer warnings.
### c++-container-inlining ###
This option controls whether constructors and destructors of "container" types
should be considered for inlining.
-analyzer-config c++-container-inlining=[true | false]
Currently, these constructors and destructors are NOT considered for inlining
by default.
The current implementation of this setting checks whether a type has a member
named 'iterator' or a member named 'begin'; these names are idiomatic in C++,
with the latter specified in the C++11 standard. The analyzer currently does a
fairly poor job of modeling certain data structure invariants of container-like
objects. For example, these three expressions should be equivalent:
std::distance(c.begin(), c.end()) == 0
c.begin() == c.end()
c.empty())
Many of these issues are avoided if containers always have unknown, symbolic
state, which is what happens when their constructors are treated as opaque.
In the future, we may decide specific containers are "safe" to model through
inlining, or choose to model them directly using checkers instead.
Basics of Implementation
-----------------------
The low-level mechanism of inlining a function is handled in
ExprEngine::inlineCall and ExprEngine::processCallExit.
If the conditions are right for inlining, a CallEnter node is created and added
to the analysis work list. The CallEnter node marks the change to a new
LocationContext representing the called function, and its state includes the
contents of the new stack frame. When the CallEnter node is actually processed,
its single successor will be a edge to the first CFG block in the function.
Exiting an inlined function is a bit more work, fortunately broken up into
reasonable steps:
1. The CoreEngine realizes we're at the end of an inlined call and generates a
CallExitBegin node.
2. ExprEngine takes over (in processCallExit) and finds the return value of the
function, if it has one. This is bound to the expression that triggered the
call. (In the case of calls without origin expressions, such as destructors,
this step is skipped.)
3. Dead symbols and bindings are cleaned out from the state, including any local
bindings.
4. A CallExitEnd node is generated, which marks the transition back to the
caller's LocationContext.
5. Custom post-call checks are processed and the final nodes are pushed back
onto the work list, so that evaluation of the caller can continue.
Retry Without Inlining
----------------------
In some cases, we would like to retry analysis without inlining a particular
call.
Currently, we use this technique to recover coverage in case we stop
analyzing a path due to exceeding the maximum block count inside an inlined
function.
When this situation is detected, we walk up the path to find the first node
before inlining was started and enqueue it on the WorkList with a special
ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The
path is then re-analyzed from that point without inlining that particular call.
Deciding When to Inline
-----------------------
In general, the analyzer attempts to inline as much as possible, since it
provides a better summary of what actually happens in the program. There are
some cases, however, where the analyzer chooses not to inline:
- If there is no definition available for the called function or method. In
this case, there is no opportunity to inline.
- If the CFG cannot be constructed for a called function, or the liveness
cannot be computed. These are prerequisites for analyzing a function body,
with or without inlining.
- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
depth. This prevents unbounded analysis due to infinite recursion, but also
serves as a useful cutoff for performance reasons.
- If the function is variadic. This is not a hard limitation, but an engineering
limitation.
Tracked by: <rdar://problem/12147064> Support inlining of variadic functions
- In C++, constructors are not inlined unless the destructor call will be
processed by the ExprEngine. Thus, if the CFG was built without nodes for
implicit destructors, or if the destructors for the given object are not
represented in the CFG, the constructor will not be inlined. (As an exception,
constructors for objects with trivial constructors can still be inlined.)
See "C++ Caveats" below.
- In C++, ExprEngine does not inline custom implementations of operator 'new'
or operator 'delete', nor does it inline the constructors and destructors
associated with these. See "C++ Caveats" below.
- Calls resulting in "dynamic dispatch" are specially handled. See more below.
- The FunctionSummaries map stores additional information about declarations,
some of which is collected at runtime based on previous analyses.
We do not inline functions which were not profitable to inline in a different
context (for example, if the maximum block count was exceeded; see
"Retry Without Inlining").
Dynamic Calls and Devirtualization
----------------------------------
"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
method calls and Objective-C message sends. Due to the path-sensitive nature of
the analysis, the analyzer may be able to reason about the dynamic type of the
object whose method is being called and thus "devirtualize" the call.
This path-sensitive devirtualization occurs when the analyzer can determine what
method would actually be called at runtime. This is possible when the type
information is constrained enough for a simulated C++/Objective-C object that
the analyzer can make such a decision.
== DynamicTypeInfo ==
As the analyzer analyzes a path, it may accrue information to refine the
knowledge about the type of an object. This can then be used to make better
decisions about the target method of a call.
Such type information is tracked as DynamicTypeInfo. This is path-sensitive
data that is stored in ProgramState, which defines a mapping from MemRegions to
an (optional) DynamicTypeInfo.
If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
inferred from the region's type or associated symbol. Information from symbolic
regions is weaker than from true typed regions.
EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
reference "A &ref" may dynamically be a subclass of 'A'.
The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
updating it as information is observed along a path that can refine that type
information for a region.
WARNING: Not all of the existing analyzer code has been retrofitted to use
DynamicTypeInfo, nor is it universally appropriate. In particular,
DynamicTypeInfo always applies to a region with all casts stripped
off, but sometimes the information provided by casts can be useful.
== RuntimeDefinition ==
The basis of devirtualization is CallEvent's getRuntimeDefinition() method,
which returns a RuntimeDefinition object. When asked to provide a definition,
the CallEvents for dynamic calls will use the DynamicTypeInfo in their
ProgramState to attempt to devirtualize the call. In the case of no dynamic
dispatch, or perfectly constrained devirtualization, the resulting
RuntimeDefinition contains a Decl corresponding to the definition of the called
function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE.
In the case of dynamic dispatch where our information is not perfect, CallEvent
can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return
TRUE. The RuntimeDefinition object will then also include a MemRegion
corresponding to the object being called (i.e., the "receiver" in Objective-C
parlance), which ExprEngine uses to decide whether or not the call should be
inlined.
== Inlining Dynamic Calls ==
The -analyzer-config ipa option has five different modes: none, basic-inlining,
inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic,
all dynamic calls are inlined, whether we are certain or not that this will
actually be the definition used at runtime. Under -analyzer-config ipa=inlining,
only "near-perfect" devirtualized calls are inlined*, and other dynamic calls
are evaluated conservatively (as if no definition were available).
* Currently, no Objective-C messages are not inlined under
-analyzer-config ipa=inlining, even if we are reasonably confident of the type
of the receiver. We plan to enable this once we have tested our heuristics
more thoroughly.
The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to
"dynamic", but performs a conservative invalidation in the general virtual case
in *addition* to inlining. The details of this are discussed below.
As stated above, -analyzer-config ipa=basic-inlining does not inline any C++
member functions or Objective-C method calls, even if they are non-virtual or
can be safely devirtualized.
Bifurcation
-----------
ExprEngine::BifurcateCall implements the -analyzer-config ipa=dynamic-bifurcate
mode.
When a call is made on an object with imprecise dynamic type information
(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
bifurcates the path and marks the object's region (retrieved from the
RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState.
Currently, there are 2 modes:
DynamicDispatchModeInlined - Models the case where the dynamic type information
of the receiver (MemoryRegion) is assumed to be perfectly constrained so
that a given definition of a method is expected to be the code actually
called. When this mode is set, ExprEngine uses the Decl from
RuntimeDefinition to inline any dynamically dispatched call sent to this
receiver because the function definition is considered to be fully resolved.
DynamicDispatchModeConservative - Models the case where the dynamic type
information is assumed to be incorrect, for example, implies that the method
definition is overridden in a subclass. In such cases, ExprEngine does not
inline the methods sent to the receiver (MemoryRegion), even if a candidate
definition is available. This mode is conservative about simulating the
effects of a call.
Going forward along the symbolic execution path, ExprEngine consults the mode
of the receiver's MemRegion to make decisions on whether the calls should be
inlined or not, which ensures that there is at most one split per region.
At a high level, "bifurcation mode" allows for increased semantic coverage in
cases where the parent method contains code which is only executed when the
class is subclassed. The disadvantages of this mode are a (considerable?)
performance hit and the possibility of false positives on the path where the
conservative mode is used.
Objective-C Message Heuristics
------------------------------
ExprEngine relies on a set of heuristics to partition the set of Objective-C
method calls into those that require bifurcation and those that do not. Below
are the cases when the DynamicTypeInfo of the object is considered precise
(cannot be a subclass):
- If the object was created with +alloc or +new and initialized with an -init
method.
- If the calls are property accesses using dot syntax. This is based on the
assumption that children rarely override properties, or do so in an
essentially compatible way.
- If the class interface is declared inside the main source file. In this case
it is unlikely that it will be subclassed.
- If the method is not declared outside of main source file, either by the
receiver's class or by any superclasses.
C++ Caveats
--------------------
C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
being constructed or destructed; that is, the type of the object depends on
which base constructors have been completed. This is tracked using
DynamicTypeInfo in the DynamicTypePropagation checker.
There are several limitations in the current implementation:
- Temporaries are poorly modeled right now because we're not confident in the
placement of their destructors in the CFG. We currently won't inline their
constructors unless the destructor is trivial, and don't process their
destructors at all, not even to invalidate the region.
- 'new' is poorly modeled due to some nasty CFG/design issues. This is tracked
in PR12014. 'delete' is not modeled at all.
- Arrays of objects are modeled very poorly right now. ExprEngine currently
only simulates the first constructor and first destructor. Because of this,
ExprEngine does not inline any constructors or destructors for arrays.
CallEvent
=========
A CallEvent represents a specific call to a function, method, or other body of
code. It is path-sensitive, containing both the current state (ProgramStateRef)
and stack space (LocationContext), and provides uniform access to the argument
values and return type of a call, no matter how the call is written in the
source or what sort of code body is being invoked.
NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
NSInvocation.
CallEvent should be used whenever there is logic dealing with function calls
that does not care how the call occurred.
Examples include checking that arguments satisfy preconditions (such as
__attribute__((nonnull))), and attempting to inline a call.
CallEvents are reference-counted objects managed by a CallEventManager. While
there is no inherent issue with persisting them (say, in a ProgramState's GDM),
they are intended for short-lived use, and can be recreated from CFGElements or
non-top-level StackFrameContexts fairly easily.

View File

@@ -0,0 +1,171 @@
The analyzer "Store" represents the contents of memory regions. It is an opaque
functional data structure stored in each ProgramState; the only class that can
modify the store is its associated StoreManager.
Currently (Feb. 2013), the only StoreManager implementation being used is
RegionStoreManager. This store records bindings to memory regions using a "base
region + offset" key. (This allows `*p` and `p[0]` to map to the same location,
among other benefits.)
Regions are grouped into "clusters", which roughly correspond to "regions with
the same base region". This allows certain operations to be more efficient,
such as invalidation.
Regions that do not have a known offset use a special "symbolic" offset. These
keys store both the original region, and the "concrete offset region" -- the
last region whose offset is entirely concrete. (For example, in the expression
`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`,
since that has a known offset from the start of the top-level `foo` struct.)
Binding Invalidation
====================
Supporting both concrete and symbolic offsets makes things a bit tricky. Here's
an example:
foo[0] = 0;
foo[1] = 1;
foo[i] = i;
After the third assignment, nothing can be said about the value of `foo[0]`,
because `foo[i]` may have overwritten it! Thus, *binding to a region with a
symbolic offset invalidates the entire concrete offset region.* We know
`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything
else, but we do have to be conservative about all other bindings within `foo`.
Continuing the example:
foo[i] = i;
foo[0] = 0;
After this latest assignment, nothing can be said about the value of `foo[i]`,
because `foo[0]` may have overwritten it! *Binding to a region R with a
concrete offset invalidates any symbolic offset bindings whose concrete offset
region is a super-region **or** sub-region of R.* All we know about `foo[i]` is
that it is somewhere within `foo`, so changing *anything* within `foo` might
change `foo[i]`, and changing *all* of `foo` (or its base region) will
*definitely* change `foo[i]`.
This logic could be improved by using the current constraints on `i`, at the
cost of speed. The latter case could also be improved by matching region kinds,
i.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i`
is.
For more detail, read through RegionStoreManager::removeSubRegionBindings in
RegionStore.cpp.
ObjCIvarRegions
===============
Objective-C instance variables require a bit of special handling. Like struct
fields, they are not base regions, and when their parent object region is
invalidated, all the instance variables must be invalidated as well. However,
they have no concrete compile-time offsets (in the modern, "non-fragile"
runtime), and so cannot easily be represented as an offset from the start of
the object in the analyzer. Moreover, this means that invalidating a single
instance variable should *not* invalidate the rest of the object, since unlike
struct fields or array elements there is no way to perform pointer arithmetic
to access another instance variable.
Consequently, although the base region of an ObjCIvarRegion is the entire
object, RegionStore offsets are computed from the start of the instance
variable. Thus it is not valid to assume that all bindings with non-symbolic
offsets start from the base region!
Region Invalidation
===================
Unlike binding invalidation, region invalidation occurs when the entire
contents of a region may have changed---say, because it has been passed to a
function the analyzer can model, like memcpy, or because its address has
escaped, usually as an argument to an opaque function call. In these cases we
need to throw away not just all bindings within the region itself, but within
its entire cluster, since neighboring regions may be accessed via pointer
arithmetic.
Region invalidation typically does even more than this, however. Because it
usually represents the complete escape of a region from the analyzer's model,
its *contents* must also be transitively invalidated. (For example, if a region
'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have
changed as well.) The algorithm that traverses this transitive closure of
accessible regions is known as ClusterAnalysis, and is also used for finding
all live bindings in the store (in order to throw away the dead ones). The name
"ClusterAnalysis" predates the cluster-based organization of bindings, but
refers to the same concept: during invalidation and liveness analysis, all
bindings within a cluster must be treated in the same way for a conservative
model of program behavior.
Default Bindings
================
Most bindings in RegionStore are simple scalar values -- integers and pointers.
These are known as "Direct" bindings. However, RegionStore supports a second
type of binding called a "Default" binding. These are used to provide values to
all the elements of an aggregate type (struct or array) without having to
explicitly specify a binding for each individual element.
When there is no Direct binding for a particular region, the store manager
looks at each super-region in turn to see if there is a Default binding. If so,
this value is used as the value of the original region. The search ends when
the base region is reached, at which point the RegionStore will pick an
appropriate default value for the region (usually a symbolic value, but
sometimes zero, for static data, or "uninitialized", for stack variables).
int manyInts[10];
manyInts[1] = 42; // Creates a Direct binding for manyInts[1].
print(manyInts[1]); // Retrieves the Direct binding for manyInts[1];
print(manyInts[0]); // There is no Direct binding for manyInts[1].
// Is there a Default binding for the entire array?
// There is not, but it is a stack variable, so we use
// "uninitialized" as the default value (and emit a
// diagnostic!).
NOTE: The fact that bindings are stored as a base region plus an offset limits
the Default Binding strategy, because in C aggregates can contain other
aggregates. In the current implementation of RegionStore, there is no way to
distinguish a Default binding for an entire aggregate from a Default binding
for the sub-aggregate at offset 0.
Lazy Bindings (LazyCompoundVal)
===============================
RegionStore implements an optimization for copying aggregates (structs and
arrays) called "lazy bindings", implemented using a special SVal called
LazyCompoundVal. When the store is asked for the "binding" for an entire
aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a
LazyCompoundVal instead. When this value is then stored into a variable, it is
bound as a Default value. This makes copying arrays and structs much cheaper
than if they had required memberwise access.
Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region,
store), representing "the value of the region during this 'snapshot' of the
store". This has important implications for any sort of liveness or
reachability analysis, which must take the bindings in the old store into
account.
Retrieving a value from a lazy binding happens in the same way as any other
Default binding: since there is no direct binding, the store manager falls back
to super-regions to look for an appropriate default binding. LazyCompoundVal
differs from a normal default binding, however, in that it contains several
different values, instead of one value that will appear several times. Because
of this, the store manager has to reconstruct the subregion chain on top of the
LazyCompoundVal region, and look up *that* region in the previous store.
Here's a concrete example:
CGPoint p;
p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'.
CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a
// snapshot of the current store state. This value is then
// used as a Default binding for the VarRegion 'p2'.
return p2.x; // The binding for FieldRegion 'p2.x' is requested.
// There is no Direct binding, so we look for a Default
// binding to 'p2' and find the LCV.
// Because it's an LCV, we look at our requested region
// and see that it's the '.x' field. We ask for the value
// of 'p.x' within the snapshot, and get back 42.

View File

@@ -0,0 +1,247 @@
# -*- coding: utf-8 -*-
#
# Clang Static Analyzer documentation build configuration file, created by
# sphinx-quickstart on Wed Jan 2 15:54:28 2013.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
from datetime import date
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'Clang Static Analyzer'
copyright = u'2013-%d, Analyzer Team' % date.today().year
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short version.
version = '6'
# The full version, including alpha/beta/rc tags.
release = '6'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'haiku'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'ClangStaticAnalyzerdoc'
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'ClangStaticAnalyzer.tex', u'Clang Static Analyzer Documentation',
u'Analyzer Team', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output --------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'clangstaticanalyzer', u'Clang Static Analyzer Documentation',
[u'Analyzer Team'], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'ClangStaticAnalyzer', u'Clang Static Analyzer Documentation',
u'Analyzer Team', 'ClangStaticAnalyzer', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'http://docs.python.org/': None}

View File

@@ -0,0 +1,23 @@
.. Clang Static Analyzer documentation master file, created by
sphinx-quickstart on Wed Jan 2 15:54:28 2013.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Clang Static Analyzer's documentation!
=================================================
Contents:
.. toctree::
:maxdepth: 2
DebugChecks
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@@ -0,0 +1,190 @@
@ECHO OFF
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set BUILDDIR=_build
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
set I18NSPHINXOPTS=%SPHINXOPTS% .
if NOT "%PAPER%" == "" (
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
)
if "%1" == "" goto help
if "%1" == "help" (
:help
echo.Please use `make ^<target^>` where ^<target^> is one of
echo. html to make standalone HTML files
echo. dirhtml to make HTML files named index.html in directories
echo. singlehtml to make a single large HTML file
echo. pickle to make pickle files
echo. json to make JSON files
echo. htmlhelp to make HTML files and a HTML help project
echo. qthelp to make HTML files and a qthelp project
echo. devhelp to make HTML files and a Devhelp project
echo. epub to make an epub
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
echo. text to make text files
echo. man to make manual pages
echo. texinfo to make Texinfo files
echo. gettext to make PO message catalogs
echo. changes to make an overview over all changed/added/deprecated items
echo. linkcheck to check all external links for integrity
echo. doctest to run all doctests embedded in the documentation if enabled
goto end
)
if "%1" == "clean" (
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
del /q /s %BUILDDIR%\*
goto end
)
if "%1" == "html" (
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
goto end
)
if "%1" == "dirhtml" (
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
goto end
)
if "%1" == "singlehtml" (
%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
goto end
)
if "%1" == "pickle" (
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the pickle files.
goto end
)
if "%1" == "json" (
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the JSON files.
goto end
)
if "%1" == "htmlhelp" (
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run HTML Help Workshop with the ^
.hhp project file in %BUILDDIR%/htmlhelp.
goto end
)
if "%1" == "qthelp" (
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run "qcollectiongenerator" with the ^
.qhcp project file in %BUILDDIR%/qthelp, like this:
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\ClangStaticAnalyzer.qhcp
echo.To view the help file:
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\ClangStaticAnalyzer.ghc
goto end
)
if "%1" == "devhelp" (
%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished.
goto end
)
if "%1" == "epub" (
%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The epub file is in %BUILDDIR%/epub.
goto end
)
if "%1" == "latex" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
if errorlevel 1 exit /b 1
echo.
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "text" (
%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The text files are in %BUILDDIR%/text.
goto end
)
if "%1" == "man" (
%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The manual pages are in %BUILDDIR%/man.
goto end
)
if "%1" == "texinfo" (
%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
goto end
)
if "%1" == "gettext" (
%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
goto end
)
if "%1" == "changes" (
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
if errorlevel 1 exit /b 1
echo.
echo.The overview file is in %BUILDDIR%/changes.
goto end
)
if "%1" == "linkcheck" (
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
if errorlevel 1 exit /b 1
echo.
echo.Link check complete; look for any errors in the above output ^
or in %BUILDDIR%/linkcheck/output.txt.
goto end
)
if "%1" == "doctest" (
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
if errorlevel 1 exit /b 1
echo.
echo.Testing of doctests in the sources finished, look at the ^
results in %BUILDDIR%/doctest/output.txt.
goto end
)
:end

View File

@@ -0,0 +1,92 @@
============
Nullability Checks
============
This document is a high level description of the nullablility checks.
These checks intended to use the annotations that is described in this
RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html.
Let's consider the following 2 categories:
1) nullable
============
If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
- 'p' gets dereferenced
Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers.
Explicit cast from nullable to nonnul::
__nullable id foo;
id bar = foo;
takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack)
anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*)
Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*).
2) nonnull
============
- Dereferencing a nonnull, or sending message to it is ok.
- Converting nonnull to nullable is Ok.
- When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors).
- But what should we do about null checks?::
__nonnull id takesNonnull(__nonnull id x) {
if (x == nil) {
// Defensive backward compatible code:
....
return nil; <- Should the analyzer cover this piece of code? Should we require the cast (__nonnull)nil?
}
....
}
There are these directions:
- We can either take the branch; this way the branch is analyzed
- Should we not warn about any nullability issues in that branch? Probably not, it is ok to break the nullability postconditions when the nullability preconditions are violated.
- We can assume that these pointers are not null and we lose coverage with the analyzer. (This can be implemented either in constraint solver or in the checker itself.)
Other Issues to keep in mind/take care of:
Messaging:
- Sending a message to a nullable pointer
- Even though the method might return a nonnull pointer, when it was sent to a nullable pointer the return type will be nullable.
- The result is nullable unless the receiver is known to be non null.
- Sending a message to a unspecified or nonnull pointer
- If the pointer is not assumed to be nil, we should be optimistic and use the nullability implied by the method.
- This will not happen automatically, since the AST will have null unspecified in this case.
Inlining
============
A symbol may need to be treated differently inside an inlined body. For example, consider these conversions from nonnull to nullable in presence of inlining::
id obj = getNonnull();
takesNullable(obj);
takesNonnull(obj);
void takesNullable(nullable id obj) {
obj->ivar // we should assume obj is nullable and warn here
}
With no special treatment, when the takesNullable is inlined the analyzer will not warn when the obj symbol is dereferenced. One solution for this is to reanalyze takesNullable as a top level function to get possible violations. The alternative method, deducing nullability information from the arguments after inlining is not robust enough (for example there might be more parameters with different nullability, but in the given path the two parameters might end up being the same symbol or there can be nested functions that take different view of the nullability of the same symbol). So the symbol will remain nonnull to avoid false positives but the functions that takes nullable parameters will be analyzed separately as well without inlining.
Annotations on multi level pointers
============
Tracking multiple levels of annotations for pointers pointing to pointers would make the checker more complicated, because this way a vector of nullability qualifiers would be needed to be tracked for each symbol. This is not a big caveat, since once the top level pointer is dereferenced, the symvol for the inner pointer will have the nullability information. The lack of multi level annotation tracking only observable, when multiple levels of pointers are passed to a function which has a parameter with multiple levels of annotations. So for now the checker support the top level nullability qualifiers only.::
int * __nonnull * __nullable p;
int ** q = p;
takesStarNullableStarNullable(q);
Implementation notes
============
What to track?
- The checker would track memory regions, and to each relevant region a qualifier information would be attached which is either nullable, nonnull or null unspecified (or contradicted to suppress warnings for a specific region).
- On a branch, where a nullable pointer is known to be non null, the checker treat it as a same way as a pointer annotated as nonnull.
- When there is an explicit cast from a null unspecified to either nonnull or nullable I will trust the cast.
- Unannotated pointers are treated the same way as pointers annotated with nullability unspecified qualifier, unless the region is wrapped in ASSUME_NONNULL macros.
- We might want to implement a callback for entry points to top level functions, where the pointer nullability assumptions would be made.