g_slice_orphan is unsafe? orphanization in general?
Andriy Gapon
avg at FreeBSD.org
Fri Oct 27 13:30:09 UTC 2017
On 09/10/2017 10:17, Andriy Gapon wrote:
>
> void
> g_slice_orphan(struct g_consumer *cp)
> {
> struct g_slicer *gsp;
>
> g_trace(G_T_TOPOLOGY, "g_slice_orphan(%p/%s)", cp, cp->provider->name);
> g_topology_assert();
>
> /* XXX: Not good enough we leak the softc and its suballocations */
> gsp = cp->geom->softc;
> cp->geom->softc = NULL; <--- I think that doing this is unsafe
> g_slice_free(gsp); <--- and this
> g_wither_geom(cp->geom, ENXIO);
> }
So, g_slice_orphan() is obviously unsafe as it destroys its softc even before it
errors its geom and its providers.
I have created a review request for my attempt to fix the problem:
https://reviews.freebsd.org/D12809
I am still somewhat concerned about the lack of a lock-step mechanism between
setting an error on a provider and checking nstart/nend (described below).
But given the amount of inter-thread communication required before a provider is
actually "finished", I do not think that any problem are likely because of that.
> I do not see how g_slice_orphan() - or, in fact, the whole orphanization process
> - is synchronized with g_io_request() and g_io_schedule_down(). I think that it
> is quite possible for a geom to become orphaned after g_io_check() succeeds and
> before geom->start is executed (or while it's being executed).
> I see that there is an attempt to prevent orphanization while a geom is in use.
> The code in one_event() does not call geom->orphan until pp->nstart == pp->nend.
> But there is no common lock to actually synchronize the check and updates of
> the variables. Atomic operations are not used either. So, in my opinion, the
> check is not really safe.
>
> It looks like r162200 attempted to fix the general problem but either it was not
> a complete solution or the I/O code was later changed:
> https://svnweb.freebsd.org/base?view=revision&revision=162200
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=102766
>
> Back to g_slice, I see that g_slice_start() makes use of the softc. So, this is
> why I believe that destroying it right in g_slice_orphan is not correct.
>
> It seems that other GEOM classes go to greater lengths to ensure that there is
> no race between orphanization and I/O processing. Typically an internal lock
> and a flag seems to be used.
>
> I wonder if g_slice should follow that technique.
> Or maybe it's better to use a lock or atomic operations at GEOM infrastructure
> level to ensure that changes to pp->error, pp->flags, pp->nstart and pp->nend
> are really seen by other threads and in the correct order.
>
--
Andriy Gapon
More information about the freebsd-geom
mailing list