g_mirror_access() dropping geom topology_lock [Was: Kernel crash trying to import a ZFS pool with log device]

Pawel Jakub Dawidek pjd at FreeBSD.org
Fri Mar 21 13:58:07 UTC 2014


On Fri, Mar 21, 2014 at 12:43:41PM +0200, Alexander Motin wrote:
> On 21.03.2014 11:37, Andriy Gapon wrote:
> > Boom!
> >
> > I see two issues here.
> > First, the ZFS tasting code could be made more robust.  If it never tried to
> > re-use the consumer and always created a new one, then most likely this crash
> > could be avoided.  But there is no bug in the code.  The code is correct and it
> > it uses GEOM topology lock to avoid any concurrency issues.
> >
> > But GEOM mirror code breaks a contract on which the ZFS code relies.
> > g_access() must be called with the topology lock hold.
> > I extend this requirement to a requirement that access method of any GEOM
> > provider must operate under the topology lock and must never drop it.
> > In other words, if a caller must acquire g_topology_lock before calling
> > g_access, then in return it must have a guarantee that the GEOM topology stays
> > unchanged across the call to g_access().
> > g_mirror_access() breaks the above contract.
> >
> > So, the code in vdev_geom_attach() obtains g_topology_lock, then it finds an
> > existing valid consumer and calls g_access() on it.  It reasonably expects that
> > the consumer remains valid, but because g_mirror_access() drops and requires the
> > topology lock, there is a chance that the topology can change and the consumer
> > may become invalid.
> >
> > I am not very familiar with gmirror code, so I am not sure how to fix the
> > problem from that end.
> 
> I can confirm this. I know about this problem for some time already. The 
> same issue as shown in GMIRROR is also present in GRAID. AFAIR the 
> problem is in keeping lock order between GEOM topology lock and class' 
> own lock.
> 
> The only "excuse" is that it is not very reasonable to have ZFS on top 
> of GMIRROR or GRAID.

In my opinion we should stop pretending that we can do without dropping
the topology lock in the access method, accept that fact and act
accordingly in other GEOM classes (like ZFS::VDEV).

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://mobter.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20140321/0458614f/attachment.sig>


More information about the freebsd-geom mailing list