cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c

Sun Jul 27 11:35:42 PDT 2003

Alexander Leidinger wrote:
> Perhaps you can convince dds (CCed) to extend devel/cscout in various
> helpful directions. Based upon his introduction to us at the time he got
> his commit bit and his homepage he may be interested in this kind of
> work.

I've been closely following the discussion from its beginning thinking
about such opportunities.  Up to now I've applied CScout on a couple of
userland programs; maybe I should concentrate on the kernel.  As a
minimum we can locate unused identifiers and thereby clean-up code
parts.  Any concrete ideas for extending CScout (keeping in mind that
its role is mainly 100% correct rename refactorings in the face of
preprocessor definitions, not model checking) more than welcome.

Poul-Henning Kamp wrote:
> If I knew what the tools were, I would have been busy writing them
> years ago, but unfortunately, I don't seem to be the one destined
> to invent the programmers spread-sheet.

Let me provide you with some details on the tools Microsoft develops and
uses internally on their Windows code.  Their code may be behind FreeBSD
in terms of cohesion and stability, but the tools appear impressive. 
The information is based on presentations given at the 2003 Faculty
Summit held in Cambridge England two weeks ago.

Two heuristic tools widely employed are called PREfix and PREfast. 
PREfix performs a path-by-path interprocedural analysis and is
computationally expensive taking 4 weeks to run on the Windows kernel. 
PREfast works by running user-supplied plug-ins when traversing the
compiler's abstract syntax tree.  It runs on desktops and is easily
customized.  The two tools are widely deployed within Microsoft; 1/6 of
the bugs fixed in the Windows Server 2003 were apparently found by those
two tools.  (Note that the above statement is completely different from:
"the two tools uncovered 1/6 of the bugs in the Windows Server 2003"
:-)  

PREfix locates errors by traversing the call graph through a limited set
of paths taking great care to be conservative and avoid false
positives.  Some of the errors it can detect include:

- NULL pointer use
- Memory allocation errors
- Use of unitialized values
- Library usage problems
- Invalid resource states

PREfast works at a local level examining the compiler's abstract syntax
tree for error idioms.  It is based on plugins, which individual
developers write.  If I remember correctly, there are hundreds of
plugins in use within Microsoft.  PREfast also uses an annotated version
of the Windows APIs to detect buffer overflow errors (most WIN32 APIs
receive a memory buffer and its size as separate arguments; I assume the
annotations bind the two).

PREfix and PREfast were termed first generation tools in contrast to the
second generation tools that are sound and based on declarative
specifications and models.

SLAM performs software model checking.  It receives as input the C
source code and the API specification rules mentioned above and can
reason through a systematic exploration of the program's state space
whether a feasible path can lead to a violation of the API
specification.  It works by converting the C source into a boolean
program (removing all non boolean elements) that can then be checked for
specific paths.  Typically the model checker is assisted by annotating
the program with additional predicates.  When applied on a set of 100
device drivers it uncovered more than 60 bugs.

ESP performs scalable error detection through a path-sensitive
analysis.  The goal is to locate program states that signify errors. The
problem of course is taming the explosion of the state space.  At each
branch the tracked program state can potentially double.   The
combinatorial explosion is managed as follows:  a) knowing the previous
state can determine the branch direction b) states can be merged at
control flow join points and c) little code affects the tracked state.

Finally, Fugue checks the behavior of programs related to the use of the
API by performing a static analysis on annotated versions of the
program.  It works on Microsoft's managed languages that support the
specification of attributes (C#, VB, Managed C++, VJ#).  Attributes are
arbitrary programmer-specified declarations that can be added on te
language's objects and are tracked throughout the compilation cycle. 
Custom attributes are added to the source to specify e.g. the state of a
resource or the "nullness" of a value.  Declarative API usage rules are
then applied to verify the implementation against the contract.

More details are available at:
http://research.microsoft.com/spt 
http://research.microsoft.com/pprc 

Diomidis - http://www.spinellis.gr