High traffic NFS performance and availability problems

Mon Feb 21 13:27:52 PST 2005

On Mon, 21 Feb 2005, David Rice wrote:

> Here are the snapshots of the output you requested. These are from the
> NFS server. We have just upgraded them to 5.3-RELEASE as so many have
> recomended.  Hope that makes them more stable. The performance still
> needs some attention. 

In the top output below, it looks like there's a lot of contention on
Giant.  In 5.3-RELEASE and before, the amr driver is not MPSAFE, but my
understanding is that in 5-STABLE, it has been made MPSAFE, which may make
quite a difference in performance.  I pinged Scott Long, who did the work
on the driver, and he indicated that backporting the patch to run on
-RELEASE would be quite difficult, so an upgrade to 5-STABLE is the best
way to get the changes.  I believe that you can build a 5-STABLE kernel
and run with a 5.3-RELEASE user space to avoid having to commit to a full
upgrade to see if that helps or not.

Two other observations:

- It looks like the amr storage array is pretty busy, which may be part of
  the issue.

- It looks like you have four processors, suggesting a two-processor Xeon
  with hyper-threading turned on. For many workloads, hyper-threading does
  not improve performance, so you may want to try turning that off in the
  BIOS to see if that helps.

Robert N M Watson

> 
> Thank You
> 
> --------------------------------------------------------------------------------------------------
> D USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
>     4 users    Load  5.28 19.37 28.00                  Feb 21 12:18
> 
> Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
>         Tot   Share      Tot    Share    Free         in  out     in  out
> Act   19404    2056    90696     3344   45216 count
> All 1020204    4280  4015204     7424         pages
>                                                           zfod   Interrupts
> Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow    7226 total
>            5128  5  60861    3  14021584    9      152732 wire        4: sio0
>                                                     23228 act         6: fdc0
> 30.2%Sys  11.8%Intr  0.0%User  0.0%Nice 58.0%Idl   803616 inact   128 8: rtc
> |    |    |    |    |    |    |    |    |    |      43556 cache       13: npx
> ===============++++++                                1660 free        15: ata
>                                                           daefr  6358 16: bge
> Namei         Name-cache    Dir-cache                     prcfr     1 17: bge
>     Calls     hits    %     hits    %                     react       18: mpt
>      1704      971   57       11    1                     pdwak       19: mpt
>                                                      5342 pdpgs   639 24: amr
> Disks amrd0   da0 pass0 pass1 pass2                       intrn   100 0: clk
> KB/t  22.41  0.00  0.00  0.00  0.00                114288 buf
> tps     602     0     0     0     0                   510 dirtybuf
> MB/s  13.16  0.00  0.00  0.00  0.00                 70235 desiredvnodes
> % busy  100     0     0     0     0                 20543 numvnodes
>                                                      7883 freevnodes
> -----------------------------------------------------------------------------------------
> last pid: 10330;  load averages: 14.69, 11.81, 18.62                                                                                           
> up 0+09:01:13  12:32:57
> 226 processes: 5 running, 153 sleeping, 57 waiting, 11 lock
> CPU states:  0.1% user,  0.0% nice, 66.0% system, 24.3% interrupt,  9.6% idle
> Mem: 23M Active, 774M Inact, 150M Wired, 52M Cache, 112M Buf, 1660K Free
> Swap: 1024M Total, 124K Used, 1024M Free
> 
>   PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
>    63 root     -44 -163     0K    12K WAIT   0 147:05 45.07% 45.07% swi1: net
>    30 root     -68 -187     0K    12K WAIT   0 101:39 32.32% 32.32% irq16: 
> bge0
>    12 root     117    0     0K    12K CPU2   2 329:09 19.58% 19.58% idle: cpu2
>    11 root     116    0     0K    12K CPU3   3 327:29 19.24% 19.24% idle: cpu3
>    13 root     114    0     0K    12K RUN    1 263:39 16.89% 16.89% idle: cpu1
>    14 root     109    0     0K    12K CPU0   0 228:50 12.06% 12.06% idle: cpu0
>   368 root       4    0  1220K   740K *Giant 3  45:27  7.52%  7.52% nfsd
>   366 root       4    0  1220K   740K *Giant 0  48:52  7.28%  7.28% nfsd
>   364 root       4    0  1220K   740K *Giant 3  53:01  7.13%  7.13% nfsd
>   367 root      -8    0  1220K   740K biord  3  41:22  7.08%  7.08% nfsd
>   372 root       4    0  1220K   740K *Giant 0  28:54  7.08%  7.08% nfsd
>   365 root      -1    0  1220K   740K *Giant 3  51:53  6.93%  6.93% nfsd
>   370 root      -1    0  1220K   740K nfsslp 0  32:49  6.84%  6.84% nfsd
>   369 root      -8    0  1220K   740K biord  1  36:40  6.49%  6.49% nfsd
>   371 root       4    0  1220K   740K *Giant 0  25:14  6.45%  6.45% nfsd
>   374 root      -1    0  1220K   740K nfsslp 2  22:31  6.45%  6.45% nfsd
>   377 root       4    0  1220K   740K *Giant 2  17:21  5.52%  5.52% nfsd
>   376 root      -4    0  1220K   740K *Giant 2  15:45  5.37%  5.37% nfsd
>   373 root      -4    0  1220K   740K ufs    3  19:38  5.18%  5.18% nfsd
>   378 root       4    0  1220K   740K *Giant 2  13:55  4.54%  4.54% nfsd
>   379 root      -8    0  1220K   740K biord  3  12:41  4.49%  4.49% nfsd
>   380 root       4    0  1220K   740K -      2  11:26  4.20%  4.20% nfsd
>     3 root      -8    0     0K    12K -      1  21:21  4.05%  4.05% g_up
>     4 root      -8    0     0K    12K -      0  20:05  3.96%  3.96% g_down
>   381 root       4    0  1220K   740K -      3   9:28  3.66%  3.66% nfsd
>   382 root       4    0  1220K   740K -      1  10:13  3.47%  3.47% nfsd
>   385 root      -1    0  1220K   740K nfsslp 3   7:21  3.17%  3.17% nfsd
>    38 root     -64 -183     0K    12K *Giant 0  14:45  3.12%  3.12% irq24: 
> amr0
>   384 root       4    0  1220K   740K -      3   8:40  3.12%  3.12% nfsd
>    72 root     -24 -143     0K    12K WAIT   2  16:50  2.98%  2.98% swi6:+
>   383 root      -8    0  1220K   740K biord  2   7:57  2.93%  2.93% nfsd
>   389 root       4    0  1220K   740K -      2   5:31  2.64%  2.64% nfsd
>   390 root      -8    0  1220K   740K biord  3   5:54  2.59%  2.59% nfsd
>   387 root      -8    0  1220K   740K biord  0   6:40  2.54%  2.54% nfsd
>   386 root      -8    0  1220K   740K biord  1   6:22  2.44%  2.44% nfsd
>   392 root       4    0  1220K   740K -      3   4:27  2.10%  2.10% nfsd
>   388 root      -4    0  1220K   740K *Giant 2   4:45  2.05%  2.05% nfsd
>   395 root       4    0  1220K   740K -      0   3:59  2.05%  2.05% nfsd
>   391 root       4    0  1220K   740K -      2   5:10  1.95%  1.95% nfsd
>   393 root       4    0  1220K   740K sbwait 1   4:13  1.56%  1.56% nfsd
>   398 root       4    0  1220K   740K -      2   3:31  1.56%  1.56% nfsd
>   399 root       4    0  1220K   740K -      3   3:12  1.56%  1.56% nfsd
>   401 root       4    0  1220K   740K -      1   2:57  1.51%  1.51% nfsd
>   403 root       4    0  1220K   740K -      0   3:04  1.42%  1.42% nfsd
>   406 root       4    0  1220K   740K -      1   2:27  1.37%  1.37% nfsd
>   397 root       4    0  1220K   740K -      3   3:16  1.27%  1.27% nfsd
>   396 root       4    0  1220K   740K -      2   3:42  1.22%  1.22% nfsd
> 
> 
> 
> 
> 
> 
> On Saturday 19 February 2005 04:23 am, Robert Watson wrote:
> > On Thu, 17 Feb 2005, David Rice wrote:
> > > Typicly we have 7 client boxes mounting storage from a single file
> > > server.  Each client box servers 1000 web sites and associate email. We
> > > have done the basic NFS tuning (ie: Read write size optimization and
> > > kernel tuning)
> >
> > How many nfsd's are you running with?
> >
> > If you run systat -vmstat 1 on your server under high load, could you send
> > us the output?  In particular, I'm interested in knowing how the system is
> > spending its time, the paging level, I/O throughput on devices, and the
> > systat -vmstat summary screen provides a good summary of this and more.  A
> > few snapshots of "gstat" output would also be very helpful.  As would a
> > snapshot or two of "top -S" output.  This will give us a picture of how
> > the system is spending its time.
> >
> > > 2. Client boxes have high load averages and sometimes crashes due to
> > > slow NFS performance.
> >
> > Could you be more specific about the crash failure mode?
> >
> > > 3. File servers that randomly crash with "Fatal trap 12: page fault
> > > while in kernel mode"
> >
> > Could you make sure you're running with at least the latest 5.3 patch
> > level on the server, which includes some NFS server stability fixes, and
> > also look at sliding to the head of 5-STABLE?  There are a number of
> > performance and stability improvements that may be relevant there.
> >
> > Could you provide serial console output of the full panic message, trap
> > details, compile the kernel with KDB+DDB, and include a full stack trace?
> > I'm happy to try to help debug these problems.
> >
> > > 4. With soft updates enabled during FSCK the fileserver will freeze with
> > > all NFS processs in the "snaplck" state. We disabled soft updates
> > > because of this.
> >
> > If it's possible to do get some more information, it would be quite
> > helpful.  In particular, could you compile the server box with
> > DDB+KDB+BREAK_TO_DEBUGGER, breka into the serial debugger when it appears
> > wedged, and put the contents of "show lockedvnods", "ps", and "trace
> > <pid>" of any processes listed in "show lockedvnods" output, that would be
> > great.  A crash dump would also be very helpful.  For some hints on the
> > information that is necessary here, take a look at the handbook chapter on
> > kernel debugging and reporting kernel bugs, and my recent post to current@
> > diagnosing a similar bug.
> >
> > If you e-enable soft updates but leave bgfsck disabled, does that correct
> > this stability problem?
> >
> > In any case, I'm happy to help try to figure out what's going on -- some
> > of the above information for stability and performance problems would be
> > quite helpful in tracking it down.
> >
> > Robert N M Watson
> 
>