vinum and GEOM deadlock situation

Poul-Henning Kamp phk at phk.freebsd.dk
Tue Feb 3 11:12:40 PST 2004


In message <20040203190839.Y616 at korben.in.tern>, Lukas Ertl writes:
>On Tue, 3 Feb 2004, Pawel Jakub Dawidek wrote:
>
>> On Tue, Feb 03, 2004 at 04:56:23PM +0100, Lukas Ertl wrote:
>> +> I'm running into a deadlock situation with the following scenario:
>> +>
>> +> Have a vinum RAID5 with several disks mounted, pull out one of the disks,
>> +> shortly thereafter all I/O hangs.
>> +>
>> +> I managed to identify the deadlock, but couldn't come up with a fix yet.
>> +>
>> +> Let's see.  Here's the backtrace of the vinum process:
>> [...]
>>
>> Yes, the deadlock is obvious.
>> [...]
>> The problem here is, that dp->d_close() is called with the topology lock
>> and d_close() is calling disk_destroy() and there topology lock should
>> not be holded.
>
>I also think that the only place where we can drop and re-grab the
>topology lock is around the dp->d_close() call, but I'm not sure if there
>are any side effects.

This is the kind of trouble I feared we would see if vinum was put
in on the disk_*() API.   The trouble is not only the g_topology()
lock, but also Giant.  And to make matter worse, the WITNESS order
of those two are the "Giant is going away" rather than the more
widespread "Giant is everywhere" order.

I have no good suggestions for fixing it, most of the places I have
had to deal with this (notably in the disk_* API) I have used the
geom_event mechanism, but in this case you probably need an event
mechanism which is "on the other side" where it does not hold the
topology lock.  Consider a task-queue.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


More information about the freebsd-geom mailing list