You've already forked linux-packaging-mono
							
							
		
			
	
	
		
			140 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			140 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | //===----------------------------------------------------------------------===// | ||
|  | // Clang Static Analyzer | ||
|  | //===----------------------------------------------------------------------===// | ||
|  | 
 | ||
|  | = Library Structure = | ||
|  | 
 | ||
|  | The analyzer library has two layers: a (low-level) static analysis | ||
|  | engine (GRExprEngine.cpp and friends), and some static checkers | ||
|  | (*Checker.cpp).  The latter are built on top of the former via the | ||
|  | Checker and CheckerVisitor interfaces (Checker.h and | ||
|  | CheckerVisitor.h).  The Checker interface is designed to be minimal | ||
|  | and simple for checker writers, and attempts to isolate them from much | ||
|  | of the gore of the internal analysis engine. | ||
|  | 
 | ||
|  | = How It Works = | ||
|  | 
 | ||
|  | The analyzer is inspired by several foundational research papers ([1], | ||
|  | [2]).  (FIXME: kremenek to add more links) | ||
|  | 
 | ||
|  | In a nutshell, the analyzer is basically a source code simulator that | ||
|  | traces out possible paths of execution.  The state of the program | ||
|  | (values of variables and expressions) is encapsulated by the state | ||
|  | (ProgramState).  A location in the program is called a program point | ||
|  | (ProgramPoint), and the combination of state and program point is a | ||
|  | node in an exploded graph (ExplodedGraph).  The term "exploded" comes | ||
|  | from exploding the control-flow edges in the control-flow graph (CFG). | ||
|  | 
 | ||
|  | Conceptually the analyzer does a reachability analysis through the | ||
|  | ExplodedGraph.  We start at a root node, which has the entry program | ||
|  | point and initial state, and then simulate transitions by analyzing | ||
|  | individual expressions.  The analysis of an expression can cause the | ||
|  | state to change, resulting in a new node in the ExplodedGraph with an | ||
|  | updated program point and an updated state.  A bug is found by hitting | ||
|  | a node that satisfies some "bug condition" (basically a violation of a | ||
|  | checking invariant). | ||
|  | 
 | ||
|  | The analyzer traces out multiple paths by reasoning about branches and | ||
|  | then bifurcating the state: on the true branch the conditions of the | ||
|  | branch are assumed to be true and on the false branch the conditions | ||
|  | of the branch are assumed to be false.  Such "assumptions" create | ||
|  | constraints on the values of the program, and those constraints are | ||
|  | recorded in the ProgramState object (and are manipulated by the | ||
|  | ConstraintManager).  If assuming the conditions of a branch would | ||
|  | cause the constraints to be unsatisfiable, the branch is considered | ||
|  | infeasible and that path is not taken.  This is how we get | ||
|  | path-sensitivity.  We reduce exponential blow-up by caching nodes.  If | ||
|  | a new node with the same state and program point as an existing node | ||
|  | would get generated, the path "caches out" and we simply reuse the | ||
|  | existing node.  Thus the ExplodedGraph is not a DAG; it can contain | ||
|  | cycles as paths loop back onto each other and cache out. | ||
|  | 
 | ||
|  | ProgramState and ExplodedNodes are basically immutable once created.  Once | ||
|  | one creates a ProgramState, you need to create a new one to get a new | ||
|  | ProgramState.  This immutability is key since the ExplodedGraph represents | ||
|  | the behavior of the analyzed program from the entry point.  To | ||
|  | represent these efficiently, we use functional data structures (e.g., | ||
|  | ImmutableMaps) which share data between instances. | ||
|  | 
 | ||
|  | Finally, individual Checkers work by also manipulating the analysis | ||
|  | state.  The analyzer engine talks to them via a visitor interface. | ||
|  | For example, the PreVisitCallExpr() method is called by GRExprEngine | ||
|  | to tell the Checker that we are about to analyze a CallExpr, and the | ||
|  | checker is asked to check for any preconditions that might not be | ||
|  | satisfied.  The checker can do nothing, or it can generate a new | ||
|  | ProgramState and ExplodedNode which contains updated checker state.  If it | ||
|  | finds a bug, it can tell the BugReporter object about the bug, | ||
|  | providing it an ExplodedNode which is the last node in the path that | ||
|  | triggered the problem. | ||
|  | 
 | ||
|  | = Notes about C++ = | ||
|  | 
 | ||
|  | Since now constructors are seen before the variable that is constructed  | ||
|  | in the CFG, we create a temporary object as the destination region that  | ||
|  | is constructed into. See ExprEngine::VisitCXXConstructExpr(). | ||
|  | 
 | ||
|  | In ExprEngine::processCallExit(), we always bind the object region to the | ||
|  | evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the | ||
|  | corresponding lazy compound value if the variable is not a reference, and | ||
|  | bind the variable region to the lazy compound value. If the variable | ||
|  | is a reference, just use the object region as the initilizer value. | ||
|  | 
 | ||
|  | Before entering a C++ method (or ctor/dtor), the 'this' region is bound | ||
|  | to the object region. In ctors, we synthesize 'this' region with   | ||
|  | CXXRecordDecl*, which means we do not use type qualifiers. In methods, we | ||
|  | synthesize 'this' region with CXXMethodDecl*, which has getThisType()  | ||
|  | taking type qualifiers into account. It does not matter we use qualified | ||
|  | 'this' region in one method and unqualified 'this' region in another | ||
|  | method, because we only need to ensure the 'this' region is consistent  | ||
|  | when we synthesize it and create it directly from CXXThisExpr in a single | ||
|  | method call. | ||
|  | 
 | ||
|  | = Working on the Analyzer = | ||
|  | 
 | ||
|  | If you are interested in bringing up support for C++ expressions, the | ||
|  | best place to look is the visitation logic in GRExprEngine, which | ||
|  | handles the simulation of individual expressions.  There are plenty of | ||
|  | examples there of how other expressions are handled. | ||
|  | 
 | ||
|  | If you are interested in writing checkers, look at the Checker and | ||
|  | CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look | ||
|  | at the files named *Checker.cpp for examples on how you can implement | ||
|  | these interfaces. | ||
|  | 
 | ||
|  | = Debugging the Analyzer = | ||
|  | 
 | ||
|  | There are some useful command-line options for debugging.  For example: | ||
|  | 
 | ||
|  | $ clang -cc1 -help | grep analyze | ||
|  |  -analyze-function <value> | ||
|  |  -analyzer-display-progress | ||
|  |  -analyzer-viz-egraph-graphviz | ||
|  |  ... | ||
|  | 
 | ||
|  | The first allows you to specify only analyzing a specific function. | ||
|  | The second prints to the console what function is being analyzed.  The | ||
|  | third generates a graphviz dot file of the ExplodedGraph.  This is | ||
|  | extremely useful when debugging the analyzer and viewing the | ||
|  | simulation results. | ||
|  | 
 | ||
|  | Of course, viewing the CFG (Control-Flow Graph) is also useful: | ||
|  | 
 | ||
|  | $ clang -cc1 -help | grep cfg | ||
|  |  -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses | ||
|  |  -cfg-add-initializers   Add C++ initializers to CFGs for all analyses | ||
|  |  -cfg-dump               Display Control-Flow Graphs | ||
|  |  -cfg-view               View Control-Flow Graphs using GraphViz | ||
|  |  -unoptimized-cfg        Generate unoptimized CFGs for all analyses | ||
|  | 
 | ||
|  | -cfg-dump dumps a textual representation of the CFG to the console, | ||
|  | and -cfg-view creates a GraphViz representation. | ||
|  | 
 | ||
|  | = References = | ||
|  | 
 | ||
|  | [1] Precise interprocedural dataflow analysis via graph reachability, | ||
|  |     T Reps, S Horwitz, and M Sagiv, POPL '95, | ||
|  |     http://portal.acm.org/citation.cfm?id=199462 | ||
|  | 
 | ||
|  | [2] A memory model for static analysis of C programs, Z Xu, T | ||
|  |     Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf |