mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2024-12-14 11:39:35 +00:00
docs: Nuke AnalyzerRegions.rst.
As per Ted's advice. It can be brought back from version control if needed. This also fixes a Sphinx warning. llvm-svn: 170401
This commit is contained in:
parent
1b5792e5ad
commit
62737b65b1
@ -1,259 +0,0 @@
|
||||
===============================================
|
||||
Static Analyzer Design Document: Memory Regions
|
||||
===============================================
|
||||
|
||||
Authors: Ted Kremenek, ``kremenek at apple``,
|
||||
Zhongxing Xu, ``xuzhongzhing at gmail``
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The path-sensitive analysis engine in libAnalysis employs an extensible
|
||||
API for abstractly modeling the memory of an analyzed program. This API
|
||||
employs the concept of "memory regions" to abstractly model chunks of
|
||||
program memory such as program variables and dynamically allocated
|
||||
memory such as those returned from 'malloc' and 'alloca'. Regions are
|
||||
hierarchical, with subregions modeling subtyping relationships, field
|
||||
and array offsets into larger chunks of memory, and so on.
|
||||
|
||||
The region API consists of two components:
|
||||
|
||||
- A taxonomy and representation of regions themselves within the
|
||||
analyzer engine. The primary definitions and interfaces are described
|
||||
in ``MemRegion.h``. At the root of the region hierarchy is the class
|
||||
``MemRegion`` with specific subclasses refining the region concept
|
||||
for variables, heap allocated memory, and so forth.
|
||||
- The modeling of binding of values to regions. For example, modeling
|
||||
the value stored to a local variable ``x`` consists of recording the
|
||||
binding between the region for ``x`` (which represents the raw memory
|
||||
associated with ``x``) and the value stored to ``x``. This binding
|
||||
relationship is captured with the notion of "symbolic stores."
|
||||
|
||||
Symbolic stores, which can be thought of as representing the relation
|
||||
``regions -> values``, are implemented by subclasses of the
|
||||
``StoreManager`` class (``Store.h``). A particular StoreManager
|
||||
implementation has complete flexibility concerning the following:
|
||||
|
||||
- *How* to model the binding between regions and values
|
||||
- *What* bindings are recorded
|
||||
|
||||
Together, both points allow different StoreManagers to tradeoff between
|
||||
different levels of analysis precision and scalability concerning the
|
||||
reasoning of program memory. Meanwhile, the core path-sensitive engine
|
||||
makes no assumptions about either points, and queries a StoreManager
|
||||
about the bindings to a memory region through a generic interface that
|
||||
all StoreManagers share. If a particular StoreManager cannot reason
|
||||
about the potential bindings of a given memory region (e.g.,
|
||||
'``BasicStoreManager``' does not reason about fields of structures) then
|
||||
the StoreManager can simply return 'unknown' (represented by
|
||||
'``UnknownVal``') for a particular region-binding. This separation of
|
||||
concerns not only isolates the core analysis engine from the details of
|
||||
reasoning about program memory but also facilities the option of a
|
||||
client of the path-sensitive engine to easily swap in different
|
||||
StoreManager implementations that internally reason about program memory
|
||||
in very different ways.
|
||||
|
||||
The rest of this document is divided into two parts. We first discuss
|
||||
region taxonomy and the semantics of regions. We then discuss the
|
||||
StoreManager interface, and details of how the currently available
|
||||
StoreManager classes implement region bindings.
|
||||
|
||||
Memory Regions and Region Taxonomy
|
||||
==================================
|
||||
|
||||
Pointers
|
||||
--------
|
||||
|
||||
Before talking about the memory regions, we would talk about the
|
||||
pointers since memory regions are essentially used to represent pointer
|
||||
values.
|
||||
|
||||
The pointer is a type of values. Pointer values have two semantic
|
||||
aspects. One is its physical value, which is an address or location. The
|
||||
other is the type of the memory object residing in the address.
|
||||
|
||||
Memory regions are designed to abstract these two properties of the
|
||||
pointer. The physical value of a pointer is represented by MemRegion
|
||||
pointers. The rvalue type of the region corresponds to the type of the
|
||||
pointee object.
|
||||
|
||||
One complication is that we could have different view regions on the
|
||||
same memory chunk. They represent the same memory location, but have
|
||||
different abstract location, i.e., MemRegion pointers. Thus we need to
|
||||
canonicalize the abstract locations to get a unique abstract location
|
||||
for one physical location.
|
||||
|
||||
Furthermore, these different view regions may or may not represent
|
||||
memory objects of different types. Some different types are semantically
|
||||
the same, for example, 'struct s' and 'my\_type' are the same type.
|
||||
|
||||
::
|
||||
|
||||
struct s;
|
||||
typedef struct s my_type;
|
||||
|
||||
But ``char`` and ``int`` are not the same type in the code below:
|
||||
|
||||
::
|
||||
|
||||
void *p;
|
||||
int *q = (int*) p;
|
||||
char *r = (char*) p;
|
||||
|
||||
Thus we need to canonicalize the MemRegion which is used in binding and
|
||||
retrieving.
|
||||
|
||||
Regions
|
||||
-------
|
||||
|
||||
Region is the entity used to model pointer values. A Region has the
|
||||
following properties:
|
||||
|
||||
- Kind
|
||||
- ObjectType: the type of the object residing on the region.
|
||||
- LocationType: the type of the pointer value that the region
|
||||
corresponds to. Usually this is the pointer to the ObjectType. But
|
||||
sometimes we want to cache this type explicitly, for example, for a
|
||||
CodeTextRegion.
|
||||
- StartLocation
|
||||
- EndLocation
|
||||
|
||||
Symbolic Regions
|
||||
----------------
|
||||
|
||||
A symbolic region is a map of the concept of symbolic values into the
|
||||
domain of regions. It is the way that we represent symbolic pointers.
|
||||
Whenever a symbolic pointer value is needed, a symbolic region is
|
||||
created to represent it.
|
||||
|
||||
A symbolic region has no type. It wraps a SymbolData. But sometimes we
|
||||
have type information associated with a symbolic region. For this case,
|
||||
a TypedViewRegion is created to layer the type information on top of the
|
||||
symbolic region. The reason we do not carry type information with the
|
||||
symbolic region is that the symbolic regions can have no type. To be
|
||||
consistent, we don't let them to carry type information.
|
||||
|
||||
Like a symbolic pointer, a symbolic region may be NULL, has unknown
|
||||
extent, and represents a generic chunk of memory.
|
||||
|
||||
.. note::
|
||||
We plan not to use loc::SymbolVal in RegionStore and remove it
|
||||
gradually.
|
||||
|
||||
Symbolic regions get their rvalue types through the following ways:
|
||||
|
||||
- Through the parameter or global variable that points to it, e.g.:
|
||||
|
||||
::
|
||||
|
||||
void f(struct s* p) {
|
||||
...
|
||||
}
|
||||
|
||||
The symbolic region pointed to by ``p`` has type ``struct s``.
|
||||
|
||||
- Through explicit or implicit casts, e.g.:
|
||||
|
||||
::
|
||||
|
||||
void f(void* p) {
|
||||
struct s* q = (struct s*) p;
|
||||
...
|
||||
}
|
||||
|
||||
We attach the type information to the symbolic region lazily. For the
|
||||
first case above, we create the ``TypedViewRegion`` only when the
|
||||
pointer is actually used to access the pointee memory object, that is
|
||||
when the element or field region is created. For the cast case, the
|
||||
``TypedViewRegion`` is created when visiting the ``CastExpr``.
|
||||
|
||||
The reason for doing lazy typing is that symbolic regions are sometimes
|
||||
only used to do location comparison.
|
||||
|
||||
Pointer Casts
|
||||
-------------
|
||||
|
||||
Pointer casts allow people to impose different 'views' onto a chunk of
|
||||
memory.
|
||||
|
||||
Usually we have two kinds of casts. One kind of casts cast down with in
|
||||
the type hierarchy. It imposes more specific views onto more generic
|
||||
memory regions. The other kind of casts cast up with in the type
|
||||
hierarchy. It strips away more specific views on top of the more generic
|
||||
memory regions.
|
||||
|
||||
We simulate the down casts by layering another ``TypedViewRegion`` on
|
||||
top of the original region. We simulate the up casts by striping away
|
||||
the top ``TypedViewRegion``. Down casts is usually simple. For up casts,
|
||||
if the there is no ``TypedViewRegion`` to be stripped, we return the
|
||||
original region. If the underlying region is of the different type than
|
||||
the cast-to type, we flag an error state.
|
||||
|
||||
For toll-free bridging casts, we return the original region.
|
||||
|
||||
We can set up a partial order for pointer types, with the most general
|
||||
type ``void*`` at the top. The partial order forms a tree with ``void*``
|
||||
as its root node.
|
||||
|
||||
Every ``MemRegion`` has a root position in the type tree. For example,
|
||||
the pointee region of ``void *p`` has its root position at the root node
|
||||
of the tree. ``VarRegion`` of ``int x`` has its root position at the
|
||||
'int type' node.
|
||||
|
||||
``TypedViewRegion`` is used to move the region down or up in the tree.
|
||||
Moving down in the tree adds a ``TypedViewRegion``. Moving up in the
|
||||
tree removes a ``TypedViewRegion``.
|
||||
|
||||
Do we want to allow moving up beyond the root position? This happens
|
||||
when:
|
||||
|
||||
::
|
||||
|
||||
int x; void *p = &x;
|
||||
|
||||
The region of ``x`` has its root position at 'int\*' node. the cast to
|
||||
void\* moves that region up to the 'void\*' node. I propose to not allow
|
||||
such casts, and assign the region of ``x`` for ``p``.
|
||||
|
||||
Another non-ideal case is that people might cast to a non-generic
|
||||
pointer from another non-generic pointer instead of first casting it
|
||||
back to the generic pointer. Direct handling of this case would result
|
||||
in multiple layers of TypedViewRegions. This enforces an incorrect
|
||||
semantic view to the region, because we can only have one typed view on
|
||||
a region at a time. To avoid this inconsistency, before casting the
|
||||
region, we strip the TypedViewRegion, then do the cast. In summary, we
|
||||
only allow one layer of TypedViewRegion.
|
||||
|
||||
Region Bindings
|
||||
---------------
|
||||
|
||||
The following region kinds are boundable: VarRegion,
|
||||
CompoundLiteralRegion, StringRegion, ElementRegion, FieldRegion, and
|
||||
ObjCIvarRegion.
|
||||
|
||||
When binding regions, we perform canonicalization on element regions and
|
||||
field regions. This is because we can have different views on the same
|
||||
region, some of which are essentially the same view with different sugar
|
||||
type names.
|
||||
|
||||
To canonicalize a region, we get the canonical types for all
|
||||
TypedViewRegions along the way up to the root region, and make new
|
||||
TypedViewRegions with those canonical types.
|
||||
|
||||
For Objective-C and C++, perhaps another canonicalization rule should be
|
||||
added: for FieldRegion, the least derived class that has the field is
|
||||
used as the type of the super region of the FieldRegion.
|
||||
|
||||
All bindings and retrievings are done on the canonicalized regions.
|
||||
|
||||
Canonicalization is transparent outside the region store manager, and
|
||||
more specifically, unaware outside the Bind() and Retrieve() method. We
|
||||
don't need to consider region canonicalization when doing pointer cast.
|
||||
|
||||
Constraint Manager
|
||||
------------------
|
||||
|
||||
The constraint manager reasons about the abstract location of memory
|
||||
objects. We can have different views on a region, but none of these
|
||||
views changes the location of that object. Thus we should get the same
|
||||
abstract location for those regions.
|
Loading…
Reference in New Issue
Block a user