[RFC] kern/kern_timeout.c rewrite in progress
Hans Petter Selasky
hps at selasky.org
Mon Dec 29 20:02:45 UTC 2014
Hi,
I recently came across a class of errors which lead me into
investigating the "kern/kern_timeout.c" and its subsystem. From what I
can see new features like the SMP awareness has been "added" instead of
fully "integrated". When going into the cornercases I've uncovered that
the internal callout statemachine can sometimes report wrong values via
its callout_active() and callout_pending() bits to its clients, which in
turn can make the clients behave badly. I further did an investigation
on how the safety of callout migration between CPU's is maintained. When
I looked into the code and found stuff like "volatile" and "while()"
loops to figure which CPU a callout belongs I understood that such logic
completely undermines the cleverness found in the turnstiles of mutexes
and decided to go through all of the logic inside "kern_timeout.c". Also
static code analysis is harder when we don't use the basic mutexes and
condition variables available in the kernel.
First of all we need to make some driving rules for everyone:
1) A new feature called direct callbacks which execute the timer
callbacks from the fast interrupt handler was added. All these callbacks
_must_ be associated with a regular spinlocks, to maintain a safe
callout_drain(). Else they should only be executed on CPU0.
2) All Giant locked callbacks should only execute on CPU0 to avoid
congestion.
3) Callbacks using read-only locks for its callback should also only
execute on CPU0 to avoid multiple instances pending for completion on
multiple CPU's, because read-only locks can be entered multiple times.
From what I can see, there are currently no consumers of this feature
in the kernel.
Secondly, we need a way to drain callbacks asynchronously, for example
like in the TCP stack. Currently the TCP shutdown sequence for a
connection looks like this:
...
void
pcb_teardown(pcb)
{
lock(pcb);
callout_stop(&pcb->c1);
callout_stop(&pcb->c2);
callout_stop(&pcb->c3);
unlock(pcb);
free(pcb);
}
I guess some of you see what is wrong: Use after free.
...
With a new function I've added, scenarios like this can be solved more
elegantly and without having to fear use after free situations!
static void
pcb_callback(void *pcb)
{
lock(pcb);
do_free = (--(pcb->refcount) == 0);
unlock(pcb);
if (do_free == 0)
return;
destroy_mtx(pcb);
free(pcb);
}
void
pcb_teardown(pcb)
{
lock(pcb);
pcb->refcount = 4;
if (callout_drain_async(&pcb->c1, pcb_callback, pcb) == 0)
pcb->refcount--;
if (callout_drain_async(&pcb->c2, pcb_callback, pcb) == 0)
pcb->refcount--;
if (callout_drain_async(&pcb->c3, pcb_callback, pcb) == 0)
pcb->refcount--;
unlock(pcb);
pcb_callback(pcb);
}
Please find attached a patch for 11-current as of today. Comments and
feedback is appreciated.
BTW: Happy New Year everyone!
--HPS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kern_timeout_wip.diff
Type: text/x-patch
Size: 35820 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20141229/6155fc05/attachment.bin>
More information about the freebsd-current
mailing list