High traffic NFS performance and availability problem

Wed Feb 23 14:06:44 PST 2005

We are willing to be a test site for the new amr driver. We have several
NFS servers running 5.3-RELEASE that have 1.3TB of disk under high load. 
Also looking for a management utility for the PERC4 under FreeBSD.
Thanks do much to all the people that have responded to this thread. Special 
thanks to Scott for his work on this driver.

On Wednesday 23 February 2005 10:51 am, Scott Long wrote:
> David,
>
> Sorry for the mis-information about the AMR status earlier in the
> thread.  I forgot that I was holding off on merging the MPSAFE work to
> 5-STABLE for a bit.  LSI is getting involved in active maintainership
> again, and I'm working with them to review all of the changes so far and
> fix some of the bugs that I accidentally introduced.  Hopefully we'll
> have a resolution by the end of the week, after which I'll prepare the
> updated driver for inclusion in 5.4.
>
> Scott
>
> David Rice wrote:
> > Where can I find the MPSAFE version of the amr PERC driver.
> > I checked the release notes for 5.3-STABLE and it makes no refrence to
> > the amr driver being MPSAFE.
> >
> > On Monday 21 February 2005 01:26 pm, Robert Watson wrote:
> >>On Mon, 21 Feb 2005, David Rice wrote:
> >>>Here are the snapshots of the output you requested. These are from the
> >>>NFS server. We have just upgraded them to 5.3-RELEASE as so many have
> >>>recomended.  Hope that makes them more stable. The performance still
> >>>needs some attention.
> >>
> >>In the top output below, it looks like there's a lot of contention on
> >>Giant.  In 5.3-RELEASE and before, the amr driver is not MPSAFE, but my
> >>understanding is that in 5-STABLE, it has been made MPSAFE, which may
> >> make quite a difference in performance.  I pinged Scott Long, who did
> >> the work on the driver, and he indicated that backporting the patch to
> >> run on -RELEASE would be quite difficult, so an upgrade to 5-STABLE is
> >> the best way to get the changes.  I believe that you can build a
> >> 5-STABLE kernel and run with a 5.3-RELEASE user space to avoid having to
> >> commit to a full upgrade to see if that helps or not.
> >>
> >>Two other observations:
> >>
> >>- It looks like the amr storage array is pretty busy, which may be part
> >> of the issue.
> >>
> >>- It looks like you have four processors, suggesting a two-processor Xeon
> >>  with hyper-threading turned on. For many workloads, hyper-threading
> >> does not improve performance, so you may want to try turning that off in
> >> the BIOS to see if that helps.
> >>
> >>Robert N M Watson
> >>
> >>>Thank You
> >>>
> >>>------------------------------------------------------------------------
> >>>- ------------------------- D USERNAME PRI NICE   SIZE    RES STATE  C
> >>> TIME   WCPU    CPU COMMAND 4 users    Load  5.28 19.37 28.00
> >>>    Feb 21 12:18
> >>>
> >>>Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP
> >>>PAGER Tot   Share      Tot    Share    Free         in  out     in  out
> >>>Act   19404    2056    90696     3344   45216 count
> >>>All 1020204    4280  4015204     7424         pages
> >>>                                                          zfod
> >>>Interrupts Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow
> >>>  7226 total 5128  5  60861    3  14021584    9      152732 wire
> >>>4: sio0 23228 act         6: fdc0 30.2%Sys  11.8%Intr  0.0%User 
> >>> 0.0%Nice 58.0%Idl   803616 inact   128 8: rtc
> >>>
> >>>|    |    |    |    |    |    |    |    |    |      43556 cache      
> >>>|    |    |    |    |    |    |    |    |    | 13: npx
> >>>
> >>>===============++++++                                1660 free       
> >>> 15: ata daefr  6358 16: bge Namei         Name-cache    Dir-cache
> >>>        prcfr     1 17: bge Calls     hits    %     hits    %
> >>>        react       18: mpt 1704      971   57       11    1
> >>>       pdwak       19: mpt 5342 pdpgs   639 24: amr Disks amrd0   da0
> >>>pass0 pass1 pass2                       intrn   100 0: clk KB/t  22.41
> >>>0.00  0.00  0.00  0.00                114288 buf
> >>>tps     602     0     0     0     0                   510 dirtybuf
> >>>MB/s  13.16  0.00  0.00  0.00  0.00                 70235 desiredvnodes
> >>>% busy  100     0     0     0     0                 20543 numvnodes
> >>>                                                     7883 freevnodes
> >>>------------------------------------------------------------------------
> >>>- ---------------- last pid: 10330;  load averages: 14.69, 11.81, 18.62
> >>> up 0+09:01:13  12:32:57
> >>>226 processes: 5 running, 153 sleeping, 57 waiting, 11 lock
> >>>CPU states:  0.1% user,  0.0% nice, 66.0% system, 24.3% interrupt,  9.6%
> >>>idle Mem: 23M Active, 774M Inact, 150M Wired, 52M Cache, 112M Buf, 1660K
> >>>Free Swap: 1024M Total, 124K Used, 1024M Free
> >>>
> >>>  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU
> >>>COMMAND 63 root     -44 -163     0K    12K WAIT   0 147:05 45.07% 45.07%
> >>>swi1: net 30 root     -68 -187     0K    12K WAIT   0 101:39 32.32%
> >>>32.32% irq16: bge0
> >>>   12 root     117    0     0K    12K CPU2   2 329:09 19.58% 19.58%
> >>> idle: cpu2 11 root     116    0     0K    12K CPU3   3 327:29 19.24%
> >>> 19.24% idle: cpu3 13 root     114    0     0K    12K RUN    1 263:39
> >>> 16.89% 16.89% idle: cpu1 14 root     109    0     0K    12K CPU0   0
> >>> 228:50 12.06% 12.06% idle: cpu0 368 root       4    0  1220K   740K
> >>> *Giant 3 45:27  7.52%  7.52% nfsd 366 root       4    0  1220K   740K
> >>> *Giant 0 48:52  7.28%  7.28% nfsd 364 root       4    0  1220K   740K
> >>> *Giant 3 53:01  7.13%  7.13% nfsd 367 root      -8    0  1220K   740K
> >>> biord  3 41:22  7.08%  7.08% nfsd 372 root       4    0  1220K   740K
> >>> *Giant 0 28:54  7.08%  7.08% nfsd 365 root      -1    0  1220K   740K
> >>> *Giant 3 51:53  6.93%  6.93% nfsd 370 root      -1    0  1220K   740K
> >>> nfsslp 0 32:49  6.84%  6.84% nfsd 369 root      -8    0  1220K   740K
> >>> biord  1 36:40  6.49%  6.49% nfsd 371 root       4    0  1220K   740K
> >>> *Giant 0 25:14  6.45%  6.45% nfsd 374 root      -1    0  1220K   740K
> >>> nfsslp 2 22:31  6.45%  6.45% nfsd 377 root       4    0  1220K   740K
> >>> *Giant 2 17:21  5.52%  5.52% nfsd 376 root      -4    0  1220K   740K
> >>> *Giant 2 15:45  5.37%  5.37% nfsd 373 root      -4    0  1220K   740K
> >>> ufs    3 19:38  5.18%  5.18% nfsd 378 root       4    0  1220K   740K
> >>> *Giant 2 13:55  4.54%  4.54% nfsd 379 root      -8    0  1220K   740K
> >>> biord  3 12:41  4.49%  4.49% nfsd 380 root       4    0  1220K   740K -
> >>>      2 11:26  4.20%  4.20% nfsd 3 root      -8    0     0K    12K -    
> >>>  1 21:21  4.05%  4.05% g_up 4 root      -8    0     0K    12K -      0
> >>> 20:05  3.96%  3.96% g_down 381 root       4    0  1220K   740K -      3
> >>> 9:28  3.66%  3.66% nfsd 382 root       4    0  1220K   740K -      1
> >>> 10:13  3.47%  3.47% nfsd 385 root      -1    0  1220K   740K nfsslp 3
> >>> 7:21  3.17%  3.17% nfsd 38 root     -64 -183     0K    12K *Giant 0
> >>> 14:45  3.12%  3.12% irq24: amr0
> >>>  384 root       4    0  1220K   740K -      3   8:40  3.12%  3.12% nfsd
> >>>   72 root     -24 -143     0K    12K WAIT   2  16:50  2.98%  2.98%
> >>>swi6:+ 383 root      -8    0  1220K   740K biord  2   7:57  2.93%  2.93%
> >>>nfsd 389 root       4    0  1220K   740K -      2   5:31  2.64%  2.64%
> >>>nfsd 390 root      -8    0  1220K   740K biord  3   5:54  2.59%  2.59%
> >>>nfsd 387 root      -8    0  1220K   740K biord  0   6:40  2.54%  2.54%
> >>>nfsd 386 root      -8    0  1220K   740K biord  1   6:22  2.44%  2.44%
> >>>nfsd 392 root       4    0  1220K   740K -      3   4:27  2.10%  2.10%
> >>>nfsd 388 root      -4    0  1220K   740K *Giant 2   4:45  2.05%  2.05%
> >>>nfsd 395 root       4    0  1220K   740K -      0   3:59  2.05%  2.05%
> >>>nfsd 391 root       4    0  1220K   740K -      2   5:10  1.95%  1.95%
> >>>nfsd 393 root       4    0  1220K   740K sbwait 1   4:13  1.56%  1.56%
> >>>nfsd 398 root       4    0  1220K   740K -      2   3:31  1.56%  1.56%
> >>>nfsd 399 root       4    0  1220K   740K -      3   3:12  1.56%  1.56%
> >>>nfsd 401 root       4    0  1220K   740K -      1   2:57  1.51%  1.51%
> >>>nfsd 403 root       4    0  1220K   740K -      0   3:04  1.42%  1.42%
> >>>nfsd 406 root       4    0  1220K   740K -      1   2:27  1.37%  1.37%
> >>>nfsd 397 root       4    0  1220K   740K -      3   3:16  1.27%  1.27%
> >>>nfsd 396 root       4    0  1220K   740K -      2   3:42  1.22%  1.22%
> >>>nfsd
> >>>
> >>>On Saturday 19 February 2005 04:23 am, Robert Watson wrote:
> >>>>On Thu, 17 Feb 2005, David Rice wrote:
> >>>>>Typicly we have 7 client boxes mounting storage from a single file
> >>>>>server.  Each client box servers 1000 web sites and associate email.
> >>>>>We have done the basic NFS tuning (ie: Read write size optimization
> >>>>>and kernel tuning)
> >>>>
> >>>>How many nfsd's are you running with?
> >>>>
> >>>>If you run systat -vmstat 1 on your server under high load, could you
> >>>>send us the output?  In particular, I'm interested in knowing how the
> >>>>system is spending its time, the paging level, I/O throughput on
> >>>>devices, and the systat -vmstat summary screen provides a good summary
> >>>>of this and more.  A few snapshots of "gstat" output would also be very
> >>>>helpful.  As would a snapshot or two of "top -S" output.  This will
> >>>>give us a picture of how the system is spending its time.
> >>>>
> >>>>>2. Client boxes have high load averages and sometimes crashes due to
> >>>>>slow NFS performance.
> >>>>
> >>>>Could you be more specific about the crash failure mode?
> >>>>
> >>>>>3. File servers that randomly crash with "Fatal trap 12: page fault
> >>>>>while in kernel mode"
> >>>>
> >>>>Could you make sure you're running with at least the latest 5.3 patch
> >>>>level on the server, which includes some NFS server stability fixes,
> >>>>and also look at sliding to the head of 5-STABLE?  There are a number
> >>>>of performance and stability improvements that may be relevant there.
> >>>>
> >>>>Could you provide serial console output of the full panic message, trap
> >>>>details, compile the kernel with KDB+DDB, and include a full stack
> >>>>trace? I'm happy to try to help debug these problems.
> >>>>
> >>>>>4. With soft updates enabled during FSCK the fileserver will freeze
> >>>>>with all NFS processs in the "snaplck" state. We disabled soft
> >>>>>updates because of this.
> >>>>
> >>>>If it's possible to do get some more information, it would be quite
> >>>>helpful.  In particular, could you compile the server box with
> >>>>DDB+KDB+BREAK_TO_DEBUGGER, breka into the serial debugger when it
> >>>>appears wedged, and put the contents of "show lockedvnods", "ps", and
> >>>>"trace <pid>" of any processes listed in "show lockedvnods" output,
> >>>>that would be great.  A crash dump would also be very helpful.  For
> >>>>some hints on the information that is necessary here, take a look at
> >>>>the handbook chapter on kernel debugging and reporting kernel bugs, and
> >>>>my recent post to current@ diagnosing a similar bug.
> >>>>
> >>>>If you e-enable soft updates but leave bgfsck disabled, does that
> >>>>correct this stability problem?
> >>>>
> >>>>In any case, I'm happy to help try to figure out what's going on --
> >>>>some of the above information for stability and performance problems
> >>>>would be quite helpful in tracking it down.
> >>>>
> >>>>Robert N M Watson
> >>
> >>_______________________________________________
> >>freebsd-performance at freebsd.org mailing list
> >>http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> >>To unsubscribe, send any mail to
> >>"freebsd-performance-unsubscribe at freebsd.org"