Non-responsive 8.0-RC1 (now 8.0-STABLE)

Arnaud Houdelette arnaud.houdelette at tzim.net
Sun Dec 6 09:54:22 UTC 2009


Peter Jeremy wrote:
> On 2009-Nov-30 19:13:30 +1100, Peter Jeremy <peter at server.vk2pj.dyndns.org> wrote:
>   
>> On 2009-Nov-29 08:56:55 +0100, Thomas Backman <serenity at exscape.org> wrote:
>>     
>>> On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>>>
>>>       
>>>> My main server is running 8.0/amd64 from between RC1 and RC2 and I've
>>>> recently had a couple of long-duration hangs on it during which time
>>>> processes doing I/O will stop responding.
>>>>         
> ...
>   
>> It actually "hung" again just after I sent the original mail.  This
>> time I managed to get console access and could check the kernel state.
>> This showed that a number of processes were blocked on ZFS locks.
>> The most commonly reported state was 'tx->tx_quiesce_done_cv)'.
>>     
>
> I've upgraded to 8-STABLE from 30-Nov and the problem is still present,
> even after disabling the boinc processes.
>
> This seems to leave race conditions inside ZFS as the only option.
>
> Has anyone else seen anything like this?
>
>   
I got the same issue since I upgraded to 8.0-RELEASE. I happens during 
high I/O operation such a buildworld. Since I run top in an ssh session, 
I can say that before the hung [zfskern] process shows high CPU usage, 
global system usage is 99%. Sometimes I can get back to normal breaking 
the build with Ctrl-C. Sometimes I don't. If enabled, the watchdog kicks 
in and the machine reboots (else, I just ssh control over it).
The machine is low (512MB) memory, with same tuning as I used in 7.2 
(arc reduced to 60M, device cache to 5M, which gave me a stable machine).
I enabled crashdumps. I can investigate if somebody give me pointers of 
where to look.

Arnaud


More information about the freebsd-stable mailing list