still mbuf leak in 9.0 / 9.1?

Fri May 17 17:31:04 UTC 2013

On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote:
> Hi List,
> I can confirm that it is the bug you mentioned steven.
> Here is how I found it.
> 
> I recorded hourly zfskern and nfsd stats. like this.
> 
> echo "PROCSTAT" >> $reportname
> pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname
> 
> luckily it crashed this night and logged this.
> 
>  1910 101508 nfsd             nfsd: service    mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7 dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 
> 
> Maybe it would be good to merge this fix into RELENG_9_1 and distribute a fix via freebsd-update what do you think?
> 
> best,
> -dennis
> 
> 
> Am 16.05.2013 um 11:42 schrieb dennis berger:
> 
> > This is indeed a ZFS+NFS system and I can see that istgt and nfs are stuck in some ZIO state. Maybe it's this. 
> > Thank's for pointing out. 
> > 
> > Is it this ZFS+NFS deadlock?
> > 
> > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c 
> > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c 
> > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) 
> > 	mutex_enter(&arc_reclaim_thr_lock); 
> > 	needfree = 1; 
> > 	cv_signal(&arc_reclaim_thr_cv); 
> > -	while (needfree) 
> > -	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); 
> > + 
> > +	/* 
> > +	 * It is unsafe to block here in arbitrary threads, because we can come 
> > +	 * here from ARC itself and may hold ARC locks and thus risk a deadlock 
> > +	 * with ARC reclaim thread. 
> > +	 */ 
> > +	if (curproc == pageproc) { 
> > +	 while (needfree) 
> > +	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); 
> > +	} 
> > 	mutex_exit(&arc_reclaim_thr_lock); 
> > 	mutex_exit(&arc_lowmem_lock); 
> > }
> > 
> > I'll try to crash our testsystem. I'll assume that stressing NFS backed with ZFS a lot might trigger this bug?
> > 
> > -dennis
> > 
> > 
> > Am 16.05.2013 um 00:03 schrieb Steven Hartland:
> > 
> >> ----- Original Message ----- From: "dennis berger" <db at nipsi.de>
> >>> FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012
> >>> 
> >>>> 3. Regarding this:
> >>>>>> A clean shutdown isn't possible though. It hangs after vnode
> >>>>>> cleaning, normally you would see detaching of usb devices here, or
> >>>>>> other devices maybe?
> >>>> Please don't conflate this with your above issue.  This is almost
> >>>> certainly unrelated.  Please start a new thread about that if desired.
> >>> 
> >>> Maybe this is a misunderstanding normally this system will shutdown cleanly, of course.
> >>> This hang only appears after the network problem above.
> >> 
> >> If this is a ZFS system, its a known issue which is fixed in current,
> >> stable-9, stable-8 and the upcoming 8.4 release.
> >> 
> >> If not and you have USB devices see if the following sysctl helps:
> >> hw.usb.no_shutdown_wait=1

I'm sorry to say it won't happen.  The only updates that the -RELEASE
branches get are for security.  If you want fixes for other things, you
need to follow/run stables branches (i.e. stable/9), otherwise you will
need to wait until 9.2-RELEASE comes out.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |