Deadlock in state 'sysctl lock'
Guy Helmer
ghelmer at palisadesys.com
Thu Feb 22 22:21:21 UTC 2007
Rink Springer wrote:
> Hi people,
>
> At work, one of our SpamAssassin/ClamAV filtering machines just entered
> a deadlock state:
>
> FreeBSD/i386 (xxx.qsp.nl) (cuad0)
>
> login: root
> load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k
> load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k
> load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k
> load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k
> load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k
>
> After inspection, I believe the following code in
> kern/kern_sysctl.c:userland_sysctl() is the culprit:
>
> SYSCTL_LOCK();
>
> do {
> req.oldidx = 0;
> req.newidx = 0;
> error = sysctl_root(0, name, namelen, &req);
> } while (error == EAGAIN);
>
> if (req.lock == REQ_WIRED && req.validlen > 0)
> vsunlock(req.oldptr, req.validlen);
>
> SYSCTL_UNLOCK();
>
> Clearly, should sysctl_root() always return EAGAIN, this will cause a
> serious deadlock condition. It appears this is possible.
>
> The only plausible reference to sysctl's returning EGAIN seems to be in
> kern/kern_proc.c:sysctl_out_proc(). However, this code returns ESRCH
> if the process couldn't have been found in the fast place, and since the
> complete handler function will be called by sysctl_root() every
> iteration, and thus will do a pfind() and return ESRCH if it failed and
> not EAGAIN as it will later on in the code path.
>
> The machine is a 6.0-STABLE SMP machine of 30-Mar-2006. No debugging
> options are in the kernel as the machine has quite some load. The only
> console messages were a lot of 'calcru' messages.
>
> Any help is very much appreciated. For now, I'd like to propose a change
> to kern/kern_sysctl.c:userland_sysctl(), to ensure this will never keep
> looping on EAGAIN states (preferably, it should trigger a panic or at
> least a KASSERT should such a condition occour). I know this is a
> bandaid for a problem we don't really quite understand yet, but this may
> ease debugging later on (especially as it will help us understand where
> exactly it is going bad)
>
> Any comments? It looks to me this deadlock is quite rare (in fact, I've
> never seen it before), but I believe it is serious enough to be
> addressed, even with such a bandaid until the real solution is presented
> by someone who knows the sysctl internals better than I do.
>
>
Interesting. Twice I have had a 6.2 system stuck where sendmail was
holding the sysctl lock while another process was holding the proctree
and/or allproc lock, if I remember correctly.
Guy
More information about the freebsd-stable
mailing list