FAIL: kernel fault injection

George Neville-Neil gnn at neville-neil.com
Tue May 12 16:12:35 UTC 2009


On May 11, 2009, at 12:29 , Zachary Loafman wrote:

> Arch -
>
> I'd like to contribute the kernel fault injection system that Isilon
> uses. Before contributing it, I'd like to get approval for the APIs
> involved.
>
> Testing errors is hard. Let's say you have:
>
> int foo(void) {
>  [...]
>  error = bar();
>  if (error) {
>    /* do stuff */
>  }
> }
>
> .. but some_func() can't reliably be made to fail. How do you test it?
> We added error injection macros that look like this:
>
> int foo(void) {
>  [...]
>  error = bar();
>  KFAIL_POINT_CODE(FP_KERN, bar_fails_foo, error = RETURN_VALUE);
>  if (error) {
>    /* do stuff */
>  }
> }
>
> The KFAIL_POINT_CODE macro adds a sysctl MIB that allows
> you to inject errors into the above code. For example:
>
> # sysctl fail_point.kern.bar_fails_foo=".1%return(5)"
>
> This says, ".1% of the time, evaluate the fail point code with 5 as
> the RETURN_VALUE". If this were a standard errno, you could read the
> above setting as "1/1000th of the time, pretend bar() returned EIO".
>
> We also have a few wrappers around KFAIL_POINT_CODE that essentially
> wrap common uses. For example, the above use can be shorthanded to:
>  KFAIL_POINT_ERROR(FP_KERN, bar_fails_foo, error)
>
> Currently, the sysctl parser accepts the following variants:
>  return(x) - triggers the code with RETURN_VALUE set to x
>  sleep(t) - sleep t milliseconds,
>  panic/break - panic or break into the debugger
>  print - print that the fail point was hit
>
> In addition to the commands, we have a syntax to express the
> when to evaluate those commands:
>  p%<command> - evaluate command p% of the time (example above)
>  5*<command> - evaluate command 5 times, then disable the expression
>
> And you can compound with expr1->expr2, so, e.g.:
>  5%return(5)->1%return(22):
>    5% of the time, return 5, 1% of the remaining time, return 22
>  5*return(0)->10*return(5)->1%return(19)
>    return 0 for 5 times, then 5 for 10 times, and after those,
>    return 19 1% of the time.
>  1%5*return(22):
>    1/100th of the time, return 22, but only do it 5 times total.
>
> I've also attached an ascii rendering of a (rough draft) man page that
> goes into more detail.
>
> Comments?
>

Hi Zach,

I've taken a brief look at the email and the man page you have sent.   
I don't
see any glaring problems that would prevent us from using this code.   
Hopefully
others will also see its usefulness.  Any idea how soon you'd like to  
commit this?
It would be great to get it in before the 8.0 branch so that the APIs  
are available
throughout the duration of that branch, and then moving forwards.

Best,
George



More information about the freebsd-arch mailing list