Giant deadlock related to twe

Thu Aug 26 11:38:35 PDT 2004

Doug White wrote:
> On Mon, 23 Aug 2004, Vinod Kashyap wrote:
>>> Just got this on my amd64 box. A disk flaked out in my machine, which
>>> has a 3ware 8006-2LP with 2 80GB drives in a RAID0.  My X session locked
>>> up and was able to break to ddb.  Some ddb twiddling follows.  It looks
>>> like, at first glance, some sort of deadlock against softupdates.
>>>
>>> <snip>
>>
>> The messages indicate timeouts due to the drive continuously returning
>> BUSY to the firmware on the controller.  This could be caused by the
>> the drive going bad, or even a one time disturbance like tugging of
>> cables, etc.
>
> Right, and a failing drive it was, but it shouldn't lock up the entire
> system when it happens.

Why not?  If the drive is continuously returning BUSY, wouldn't the
requests just keep getting retried and a process just wait for them to
successfully complete?  To the user, this would manifest itself as a
lockup because the process would block.  X and company do a lot of
reading/writing of temporary files, so what you are seeing makes sense to
me.  I see a similar lockup when the NFS server hosting my home directory
goes down (SMP -CURRENT so it's been a bit exciting lately...).  As soon
as the NFS server comes back up X jumps to life again.

Jon