svn commit: r240822 - head/sys/geom
Kenneth D. Merry
ken at FreeBSD.org
Wed Sep 26 19:58:21 UTC 2012
On Wed, Sep 26, 2012 at 21:45:41 +0200, Pawel Jakub Dawidek wrote:
> On Wed, Sep 26, 2012 at 01:21:17PM -0600, Kenneth D. Merry wrote:
> > On Wed, Sep 26, 2012 at 20:53:39 +0200, Pawel Jakub Dawidek wrote:
> > > On Wed, Sep 26, 2012 at 11:29:17AM -0600, Kenneth D. Merry wrote:
> > > > Here is what CAM needs at each step:
> > > >
> > > > 1. When a device goes away, we need a method to call from daoninvalidate()
> > > > (or any other peripheral driver invalidate routine) with these
> > > > properties:
> > > > - It tells GEOM that the device has gone away, and starts the process
> > > > of shutting down the device. (i.e. withers/orphans the provider)
> > > > - It is callable from an interrupt context, with the SIM (MTX_DEF) lock
> > > > held, so it can't sleep.
> > >
> > > Neither g_wither_provider() nor g_orphan_provider() require the topology
> > > lock. They only acquire the event lock, but it is regular mutex, so this
> > > is fine. Traversing geom's providers list looks like something that does
> > > need the topology lock, but maybe traversing is not needed at all.
> > > The reason for this change was a panic in iSCSI initiator where
> > > disk_gone() was called and provider was destroyed before g_wither_geom()
> > > returned.
> >
> > Ahh. How about using LIST_FOREACH_SAFE? Would that address the problem at
> > hand? Are there any other races in there?
>
> It depends. If one geom can hold more than one provider then it might be
> racy, but from what I see there is always only one provider - there has
> to be only one, because disk_destroy() destroys it and struct disk
> represents always only one disk. If that's true then I see not reason to
> have a loop in there. I'd change it to:
>
> void
> disk_gone(struct disk *dp)
> {
> struct g_geom *gp;
> struct g_provider *pp;
>
> gp = dp->d_geom;
> if (gp != NULL) {
> pp = LIST_FIRST(&gp->provider);
> if (pp != NULL)
> g_wither_provider(pp, ENXIO);
> }
> }
I would suggest doing LIST_FOREACH_SAFE() (with a comment explaining why)
instead. That way just in case someone adds another provider down the
road it will be handled properly.
Otherwise we need a comment or KASSERT somewhere to explain that we depend
on there only being one provider, and things will break if there is more
than one.
> > > So maybe disk_destroy() should first orphan provider, which in turn will
> > > set its error. If provider's error is set, all I/O requests will be
> > > denied by GEOM by returning provider's error, so strategy method within
> > > a driver won't be called.
> >
> > The current semantics of disk_destroy() are that the da(4) driver won't use
> > the disk structure after it is called. We can guarantee that if it is
> > called from dacleanup(), but not if it is called from daoninvalidate().
> >
> > And if we combined the functionality of the current disk_gone() (which
> > orphans the provider) and disk_destroy() routines, we would have to call it
> > from daoninvalidate(). And that won't work, because the da(4) driver may
> > well access elements of the disk structure after daoninvalidate() is
> > called.
>
> And I assume this is not something that can be fixed/changed?
No, not really. It would probably take quite a bit of work to go to a two
step process, and I'm not sure that it would even work in the end.
Ken
--
Kenneth Merry
ken at FreeBSD.ORG
More information about the svn-src-head
mailing list