NFSD hang

Jeremy Chadwick freebsd at jdc.parodius.com
Tue Sep 27 12:59:56 UTC 2011


On Tue, Sep 27, 2011 at 05:41:57AM -0700, Kirill Yelizarov wrote:
> From: Jeremy Chadwick <freebsd at jdc.parodius.com>
> To: Kirill Yelizarov <ykirill at yahoo.com>
> Cc: rmacklem at uoguelph.ca; freebsd-stable at freebsd.org
> Sent: Tuesday, September 27, 2011 3:59 PM
> Subject: Re: NFSD hang
> 
> On Tue, Sep 27, 2011 at 04:04:10AM -0700, Kirill Yelizarov wrote:
> > I found a had sync enabled on my server so I set? zfs?set?sync=disabled data
> > and will look for failures. Are there any other setting for nfs over zfs i can check or set? 
> > 
> > ________________________________
> > 
> > # uname -a
> > FreeBSD brat.faberlic.com 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Jun? 9 11:22:38 MSD 2011???? root@**:/usr/obj/usr/src/sys/BRAT? amd64 Sources were taken at that time
> > 
> > There are a lot of this. Should i paste them all here or part is enough?
> > 
> > brat# procstat -k -k 1666
> > ? PID??? TID COMM???????????? TDNAME?????????? KSTACK?????????????????????? 
> > ?1666 100323 nfsd???????????? nfsd: master???? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_run+0x8b nfssvc_nfsd+0x97 nfssvc_nfsserver+0x53 nfssvc+0x44 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 
> > ?1666 100391 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100392 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100393 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100394 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100395 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100396 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100397 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100398 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100399 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100400 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100401 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100402 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100403 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100404 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100405 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100406 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100407 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100408 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100409 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100410 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100411 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100412 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100413 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100414 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100415 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100416 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100417 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100418 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100419 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100420 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100421 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100422 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100423 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100424 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100425 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100426 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100427 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100428 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100429 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100430 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100431 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100432 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100433 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100434 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100435 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100436 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100437 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100438 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100439 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100440 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100441 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100442 nfsd???????????? nfsd: service??? <running>??????????????????? 
> > ?1666 100443 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100444 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100445 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100446 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe 
> > ?1666 100447 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe
> > 
> > 
> > 
> > ________________________________
> > From: Jeremy Chadwick <freebsd at jdc.parodius.com>
> > To: Kirill Yelizarov <ykirill at yahoo.com>
> > Cc: "freebsd-stable at freebsd.org" <freebsd-stable at freebsd.org>
> > Sent: Monday, September 26, 2011 10:32 AM
> > Subject: Re: NFSD hang
> > 
> > On Sun, Sep 25, 2011 at 11:14:30PM -0700, Kirill Yelizarov wrote:
> > > Good Day!
> > > I'v got a problem with nfs share on zfs volume. Everything worked fine for a few month and now it hang. This share stores logs from 9 servers at night, about 1-2Gb from each server. ZFS is filled to 26% and it is v28
> > > 
> > > last pid: 46573;? load averages: 195.82, 199.86, 200.12?????????????????????????????????????????????????????????????????????????????? up 108+21:56:50 10:05:06
> > > 432 processes: 208 running, 224 sleeping
> > > CPU:? 0.0% user,? 0.0% nice,? 100% system,? 0.0% interrupt,? 0.0% idle
> > > Mem: 280M Active, 1469M Inact, 9584M Wired, 161M Cache, 1232M Buf, 311M Free
> > > Swap: 16G Total, 16G Free
> > > 
> > > ? PID USERNAME????? THR PRI NICE?? SIZE??? RES STATE?? C?? TIME?? WCPU COMMAND
> > > ?1666 root????????? 256? 76??? 0? 5788K? 5120K RUN??? 14 476.8H 1508.64% nfsd
> > > 
> > > # zpool list
> > > NAME?? SIZE? ALLOC?? FREE??? CAP? DEDUP? HEALTH? ALTROOT
> > > data? 3.62T?? 954G? 2.69T??? 25%? 1.00x? ONLINE? -
> > > 
> > > # zfs list
> > > NAME?? USED? AVAIL? REFER? MOUNTPOINT
> > > data?? 954G? 2.64T?? 954G? /data
> > > 
> > > # zfs mount
> > > data??????????????????????????? /data
> > > 
> > > What should i look for to resolve it?
> > 
> > What version of FreeBSD exactly, and what build date?
> > 
> > Please provide output from "procstat -k -k 1666" (yes, two -k's).
> 
> Can you explain the correlation between the "sync" parameter (which I
> have to assume was set to "standard" -- the default -- on all of your
> filesystems) and your nfsd issue?? I do not see the correlation.
> 
> My intention of asking for procstat -k -k output (which you did provide;
> thank you) was for Rick Macklem (who's currently working on NFS on
> FreeBSD) to chime in with some insights.? He may be busy, but I've CC'd
> him here.
> 
> I found it in the wiki http://wiki.freebsd.org/ZFSTuningGuide. So i gave it a try. I thought it is somehow related with zfs because i couldn't even run ls on zfs volume. I had to reset this server because it didn't respond to init commands.

I still don't see any indication in the procstat output that your
problem is ZFS-related.  To me looks like nfsd is spinning hard; on what
I do not know, but I don't see any ZFS functions in the stack list.

I would strongly recommend you reconsider tinkering with the "sync"
parameter, and instead wait for Rick to chime in with some information
or requests for further details.

Furthermore, your reply removed Rick from the thread.  I've put him back
in the CC list.  Please follow mailing list etiquette.  Thank you.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-stable mailing list