Why INVARIANTS option and sanity checking?
Robert Watson
rwatson at FreeBSD.org
Wed Nov 2 02:43:16 PST 2005
On Wed, 2 Nov 2005, nocool wrote:
> Hi, I need some explanation about INVARIANTS compile option. This option
> has the description that enable calls of extra sanity checking. What
> does sanity mean here? Where and why we need to use this option?
There are a number of debugging kernel options available in the kernel,
including INVARIANTS, WITNESS, SOCKBUF_DEBUG, etc. They all exchange
performance for checking of programmer belief about the invariants of the
source code. The goal is to have the programmer document their beliefs
about the state of the system as a series of tests, which are then
validated at run time. These tests can be cheap (extra NULL checks), or
they can be very expensive (clearing and checking memory on free and
allocate, run-time lock order verification). As such, they are used
extensively during development, but get turned off in production for
performance reasons. For example, for some workloads, we've had reports
of 70%+ loss in performance due to running with WITNESS turned on.
Part of the goal of an invariants test is to fail-stop the system before
it goes from violating a low-level assumption to wide-spread data
corruption and hard-to-track bugs. This makes it much easier to analyze
the bug and fix it (or for that matter, fix the assertion). As such, most
invariant violations will result in a panic and optionally a core dump for
diagnostic purposes. However, some invariants testing, such as lock order
analysis, is configurable to either generate a warning with debugging
trace, or to panic, depending on desired usage.
> I find some codes in kern/kern_malloc.c in 5.4 kernel:
>
> 511 kmemzones[indx].kz_zone = uma_zcreate(name, size,
> 512 #ifdef INVARIANTS
> 513 mtrash_ctor, mtrash_dtor, mtrash_init, mtrash_fini,
> 514 #else
> 515 NULL, NULL, NULL, NULL,
> 516 #endif
> 517 UMA_ALIGN_PTR, UMA_ZONE_MALLOC);
>
> In the case INVARIANTS is defined, kz_zone will be set up with the
> constructor function mtrash_ctor and destructor function mtrash_dtor.
> When kz_zone free some items, kernel will call mtrash_dtor(), every item
> will be filled with the value of uma_junk. When some items will be
> reallocated, kernel calls mtrash_ctor() and makes sure the constructing
> item has'nt been overwritten since it was freed through comparing every
> int of the item with uma_junk. Why kmemzones need this check, while
> other zones and memory areas need't? Where comes the danger that the
> memory item will be overwritted after its free?
The UMA slab allocator implement an object life cycle, in which memory
moves between three states:
--zone_init--> --zone_ctor-->
[uninitialized] [initialized] [allocated]
<--zone_fini-- <--zone_dtor--
This allows the reuse of memory for the same type of object repeatedly,
allowing some state to be reused across allocations. For example, threads
are always associated with thread stacks. Rather than reallocating the
stack separately from the thread, the zone caches the stack with the
thread in its initialized state, allowing less work to occur each time a
thread is allocated and free'd. As such, there will be data in the memory
object that can't be trashed on its destructor and tested on its
constructor -- if this were done, the persistent state would be lost. So
zones are individually configured to perform memory trashing and testing
based on whether or not they take advantage of persistent state between
allocations.
With regard to why this is helpful -- since the C language is not type
safe, nothing in the language prevents touching memory after it has been
free'd. Therefore, it is a ripe opportunity for nasty bugs -- things like
the following:
crfree(cred);
cred->cr_uid = 0;
These bugs are notoriously hard to catch, as in a multi-threaded,
multi-processing kernel, it's possibly that the memory may actually be
allocated to another thread as soon as it is free'd, resulting in the
above assignment occuring on valid, allocated memory. The ctor and dtor
tests are designed to help identify when an access has happened after
free, and if so, to memory owned by what zone. MEMGUARD is another
similar notion, only it uses the VM system to help detect references to
memory using page protections. The above bug example is one of the more
simple of its class -- often the bug occurs due to a stray uncleared
pointer in another data structure, which may persist for a long time
before it is used.
Basically, it all comes down to this: invariants and sanity checking allow
programmers to test that their assumptions about the source code they (or
someone else) has implemented. This helps find bugs faster, and in a way
that makes them much easier to debug.
Robert N M Watson
More information about the freebsd-current
mailing list