LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580

Kostik Belousov kostikbel at gmail.com
Fri Aug 20 19:19:58 UTC 2010


On Fri, Aug 20, 2010 at 08:55:08PM +0400, pluknet wrote:
> On 19 August 2010 17:34, John Baldwin <jhb at freebsd.org> wrote:
> > On Thursday, August 19, 2010 5:29:25 am pluknet wrote:
> >> On 19 August 2010 00:07, John Baldwin <jhb at freebsd.org> wrote:
> >> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote:
> >> >> On 18 August 2010 23:11, pluknet <pluknet at gmail.com> wrote:
> >> >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel at gmail.com> wrote:
> >> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote:
> >> >> >>> On 18 August 2010 12:07, pluknet <pluknet at gmail.com> wrote:
> >> >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel at gmail.com> wrote:
> >> >> >>> >
> >> >> >>> >>
> >> >> >>> >> Also please take a note of the John' suggestion to use the taskqueue.
> >> >> >>> >
> >> >> >>> > I decided to go this road. Thank you both.
> >> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark results.
> >> >> >>> >
> >> >> >>>
> >> >> >>> So, I modified the patch to defer proc_create() with taskqueue(9).
> >> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=yes` perf.
> >> > evaluation.
> >> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both
> >> >> >>> nfs-mounted over 1Gbit LAN.
> >> >> >>>
> >> >> >>> clean old
> >> >> >>> 1137.985u 239.411s 7:42.15 298.0%       6538+2133k 87+43388io 226pf+0w
> >> >> >>>
> >> >> >>> clean new
> >> >> >>> 1134.755u 240.032s 7:41.25 298.0%       6553+2133k 87+43367io 224pf+0w
> >> >> >>>
> >> >> >>> Patch needs polishing, though it generally works.
> >> >> >>> Not sure if shep_chan (or whatever name it will get) needs locking.
> >> >> >> As I said yesterday, if several requests to create nfsiod coming one
> >> >> >> after another, you would loose all but the last.
> >> >> >>
> >> >> >> You should put the requests into the list, probably protected by
> >> >> >> nfs_iod_mtx.
> >> >> >>
> >> >> >
> >> >> > How about this patch? Still several things to ask.
> >> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_iod_mtx
> >> > held.
> >> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is
> >> > appreciated.
> >> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set
> >> >> > NFSIOD_NOT_AVAILABLE)
> >> >> > on memory shortage? Not tested.
> >> >> >
> >> >> > There are debug printf() left intentionally to see how 3 contexts run
> >> > under load
> >> >> > to each other. I attached these messages as well if that makes sense.
> >> >> >
> >> >>
> >> >> Ah, yes. Sorry, forgot about that.
> >> >
> >> > Your task handler needs to run in a loop until the list of requests is empty.
> >> > If multiple threads call taskqueue_enqueue() before your task is scheduled,
> >> > they will be consolidated down to a single call of your task handler.
> >>
> >> Thanks for your suggestions.
> >>
> >> So I converted nfs_nfsiodnew_tq() into loop, and as such
> >> I changed a cleanup SLIST_FOREACH() as well to free a bulk of
> >> (potentially consolidated) completed requests in one pass.
> >> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it
> >> with nfs_iod_mtx held.
> >>
> >> >
> >> > -       int error, i;
> >> > -       int newiod;
> >> > +       int i, newiod, error;
> >> >
> >> > You should sort these alphabetically if you are going to change this.  I would
> >> > probably just leave it as-is.
> >>
> >> Err.. that's resulted after a set of changes. Thanks for noting that.
> >>
> >> >
> >> > Also, I'm not sure how this works as you don't actually wait for the task to
> >> > complete.  If the taskqueue_enqueue() doesn't preempt, then you may read
> >> > ni_error as zero before the kproc has actually been created and return success
> >> > even though an nfsiod hasn't been created.
> >> >
> >>
> >> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync with
> >> task handler. That was part to think about, as I didn't find such a use pattern.
> >> It seems though that tasks are launched now strictly one-by-one, without
> >> visible parallelism (as far as debug printf()s don't overlap anymore).
> >
> > Ah, if it is safe to sleep then I have a couple of suggestions:
> >
> 
> Cool, credits go to John :).
> I just adopted all of your changes (attached).
> 
> > - Use M_WAITOK to malloc() so you don't have to worry about the failure case
> >  (I see Rick already suggested this).
> 
> After all that reduces to the following replacement in nfs_nfsiodnew().
> So, no regression should be there in theory.
> 
>  mtx_unlock(&nfs_iod_mtx);
> - kproc_create(...)
> + malloc(...)
> mtx_lock(&nfs_iod_mtx);
> 
> It survived after this simple test running for an hour, which
> forces creation of many iods with r/w from 300 threads:
> 
> nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs)
> 
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> while [ true ]; do sysctl vfs.nfs.iodmax=2; sysctl vfs.nfs.iodmax=60;
> sleep 20; done
> 
> randomio is from ports/149838.
> 
> 
> P.S.
> it's funny to see in top
>     0 root       10  44    0     0K   144K deadlk  2  23:16 28.86% kernel
> Someone might think the kernel is in deadlock.

It seems nobody replied to the mdf@ objection against wait of the
new proc startup being equivalent to the LOR. I think that the wait
is safe, because the task is executed in the context of
the different process then the enqueue request.
This might be worth noting in the comment or commit message.

You should drain the taskq and prevent creation of new nfsiods before
waiting for existing nfsiods to exit on module unload.

I find it somewhat weird that nip is removed from the list in the
task handler, but freed in the enqueue thread. You could provide
pointers to the local variables to exchange done/exit codes.
I do not insist on this part.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20100820/07f34529/attachment.pgp


More information about the freebsd-current mailing list