High traffic NFS performance and availability problem

Wed Feb 23 10:44:57 PST 2005

Where can I find the MPSAFE version of the amr PERC driver.
I checked the release notes for 5.3-STABLE and it makes no refrence to
the amr driver being MPSAFE.

On Monday 21 February 2005 01:26 pm, Robert Watson wrote:
> On Mon, 21 Feb 2005, David Rice wrote:
> > Here are the snapshots of the output you requested. These are from the
> > NFS server. We have just upgraded them to 5.3-RELEASE as so many have
> > recomended.  Hope that makes them more stable. The performance still
> > needs some attention.
>
> In the top output below, it looks like there's a lot of contention on
> Giant.  In 5.3-RELEASE and before, the amr driver is not MPSAFE, but my
> understanding is that in 5-STABLE, it has been made MPSAFE, which may make
> quite a difference in performance.  I pinged Scott Long, who did the work
> on the driver, and he indicated that backporting the patch to run on
> -RELEASE would be quite difficult, so an upgrade to 5-STABLE is the best
> way to get the changes.  I believe that you can build a 5-STABLE kernel
> and run with a 5.3-RELEASE user space to avoid having to commit to a full
> upgrade to see if that helps or not.
>
> Two other observations:
>
> - It looks like the amr storage array is pretty busy, which may be part of
>   the issue.
>
> - It looks like you have four processors, suggesting a two-processor Xeon
>   with hyper-threading turned on. For many workloads, hyper-threading does
>   not improve performance, so you may want to try turning that off in the
>   BIOS to see if that helps.
>
> Robert N M Watson
>
> > Thank You
> >
> > -------------------------------------------------------------------------
> >------------------------- D USERNAME PRI NICE   SIZE    RES STATE  C  
> > TIME   WCPU    CPU COMMAND 4 users    Load  5.28 19.37 28.00             
> >     Feb 21 12:18
> >
> > Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP
> > PAGER Tot   Share      Tot    Share    Free         in  out     in  out
> > Act   19404    2056    90696     3344   45216 count
> > All 1020204    4280  4015204     7424         pages
> >                                                           zfod  
> > Interrupts Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow 
> >   7226 total 5128  5  60861    3  14021584    9      152732 wire       
> > 4: sio0 23228 act         6: fdc0 30.2%Sys  11.8%Intr  0.0%User  0.0%Nice
> > 58.0%Idl   803616 inact   128 8: rtc
> >
> > |    |    |    |    |    |    |    |    |    |      43556 cache       13:
> > |    |    |    |    |    |    |    |    |    | npx
> >
> > ===============++++++                                1660 free        15:
> > ata daefr  6358 16: bge Namei         Name-cache    Dir-cache            
> >         prcfr     1 17: bge Calls     hits    %     hits    %            
> >         react       18: mpt 1704      971   57       11    1             
> >        pdwak       19: mpt 5342 pdpgs   639 24: amr Disks amrd0   da0
> > pass0 pass1 pass2                       intrn   100 0: clk KB/t  22.41 
> > 0.00  0.00  0.00  0.00                114288 buf
> > tps     602     0     0     0     0                   510 dirtybuf
> > MB/s  13.16  0.00  0.00  0.00  0.00                 70235 desiredvnodes
> > % busy  100     0     0     0     0                 20543 numvnodes
> >                                                      7883 freevnodes
> > -------------------------------------------------------------------------
> >---------------- last pid: 10330;  load averages: 14.69, 11.81, 18.62
> > up 0+09:01:13  12:32:57
> > 226 processes: 5 running, 153 sleeping, 57 waiting, 11 lock
> > CPU states:  0.1% user,  0.0% nice, 66.0% system, 24.3% interrupt,  9.6%
> > idle Mem: 23M Active, 774M Inact, 150M Wired, 52M Cache, 112M Buf, 1660K
> > Free Swap: 1024M Total, 124K Used, 1024M Free
> >
> >   PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU
> > COMMAND 63 root     -44 -163     0K    12K WAIT   0 147:05 45.07% 45.07%
> > swi1: net 30 root     -68 -187     0K    12K WAIT   0 101:39 32.32%
> > 32.32% irq16: bge0
> >    12 root     117    0     0K    12K CPU2   2 329:09 19.58% 19.58% idle:
> > cpu2 11 root     116    0     0K    12K CPU3   3 327:29 19.24% 19.24%
> > idle: cpu3 13 root     114    0     0K    12K RUN    1 263:39 16.89%
> > 16.89% idle: cpu1 14 root     109    0     0K    12K CPU0   0 228:50
> > 12.06% 12.06% idle: cpu0 368 root       4    0  1220K   740K *Giant 3 
> > 45:27  7.52%  7.52% nfsd 366 root       4    0  1220K   740K *Giant 0 
> > 48:52  7.28%  7.28% nfsd 364 root       4    0  1220K   740K *Giant 3 
> > 53:01  7.13%  7.13% nfsd 367 root      -8    0  1220K   740K biord  3 
> > 41:22  7.08%  7.08% nfsd 372 root       4    0  1220K   740K *Giant 0 
> > 28:54  7.08%  7.08% nfsd 365 root      -1    0  1220K   740K *Giant 3 
> > 51:53  6.93%  6.93% nfsd 370 root      -1    0  1220K   740K nfsslp 0 
> > 32:49  6.84%  6.84% nfsd 369 root      -8    0  1220K   740K biord  1 
> > 36:40  6.49%  6.49% nfsd 371 root       4    0  1220K   740K *Giant 0 
> > 25:14  6.45%  6.45% nfsd 374 root      -1    0  1220K   740K nfsslp 2 
> > 22:31  6.45%  6.45% nfsd 377 root       4    0  1220K   740K *Giant 2 
> > 17:21  5.52%  5.52% nfsd 376 root      -4    0  1220K   740K *Giant 2 
> > 15:45  5.37%  5.37% nfsd 373 root      -4    0  1220K   740K ufs    3 
> > 19:38  5.18%  5.18% nfsd 378 root       4    0  1220K   740K *Giant 2 
> > 13:55  4.54%  4.54% nfsd 379 root      -8    0  1220K   740K biord  3 
> > 12:41  4.49%  4.49% nfsd 380 root       4    0  1220K   740K -      2 
> > 11:26  4.20%  4.20% nfsd 3 root      -8    0     0K    12K -      1 
> > 21:21  4.05%  4.05% g_up 4 root      -8    0     0K    12K -      0 
> > 20:05  3.96%  3.96% g_down 381 root       4    0  1220K   740K -      3  
> > 9:28  3.66%  3.66% nfsd 382 root       4    0  1220K   740K -      1 
> > 10:13  3.47%  3.47% nfsd 385 root      -1    0  1220K   740K nfsslp 3  
> > 7:21  3.17%  3.17% nfsd 38 root     -64 -183     0K    12K *Giant 0 
> > 14:45  3.12%  3.12% irq24: amr0
> >   384 root       4    0  1220K   740K -      3   8:40  3.12%  3.12% nfsd
> >    72 root     -24 -143     0K    12K WAIT   2  16:50  2.98%  2.98%
> > swi6:+ 383 root      -8    0  1220K   740K biord  2   7:57  2.93%  2.93%
> > nfsd 389 root       4    0  1220K   740K -      2   5:31  2.64%  2.64%
> > nfsd 390 root      -8    0  1220K   740K biord  3   5:54  2.59%  2.59%
> > nfsd 387 root      -8    0  1220K   740K biord  0   6:40  2.54%  2.54%
> > nfsd 386 root      -8    0  1220K   740K biord  1   6:22  2.44%  2.44%
> > nfsd 392 root       4    0  1220K   740K -      3   4:27  2.10%  2.10%
> > nfsd 388 root      -4    0  1220K   740K *Giant 2   4:45  2.05%  2.05%
> > nfsd 395 root       4    0  1220K   740K -      0   3:59  2.05%  2.05%
> > nfsd 391 root       4    0  1220K   740K -      2   5:10  1.95%  1.95%
> > nfsd 393 root       4    0  1220K   740K sbwait 1   4:13  1.56%  1.56%
> > nfsd 398 root       4    0  1220K   740K -      2   3:31  1.56%  1.56%
> > nfsd 399 root       4    0  1220K   740K -      3   3:12  1.56%  1.56%
> > nfsd 401 root       4    0  1220K   740K -      1   2:57  1.51%  1.51%
> > nfsd 403 root       4    0  1220K   740K -      0   3:04  1.42%  1.42%
> > nfsd 406 root       4    0  1220K   740K -      1   2:27  1.37%  1.37%
> > nfsd 397 root       4    0  1220K   740K -      3   3:16  1.27%  1.27%
> > nfsd 396 root       4    0  1220K   740K -      2   3:42  1.22%  1.22%
> > nfsd
> >
> > On Saturday 19 February 2005 04:23 am, Robert Watson wrote:
> > > On Thu, 17 Feb 2005, David Rice wrote:
> > > > Typicly we have 7 client boxes mounting storage from a single file
> > > > server.  Each client box servers 1000 web sites and associate email.
> > > > We have done the basic NFS tuning (ie: Read write size optimization
> > > > and kernel tuning)
> > >
> > > How many nfsd's are you running with?
> > >
> > > If you run systat -vmstat 1 on your server under high load, could you
> > > send us the output?  In particular, I'm interested in knowing how the
> > > system is spending its time, the paging level, I/O throughput on
> > > devices, and the systat -vmstat summary screen provides a good summary
> > > of this and more.  A few snapshots of "gstat" output would also be very
> > > helpful.  As would a snapshot or two of "top -S" output.  This will
> > > give us a picture of how the system is spending its time.
> > >
> > > > 2. Client boxes have high load averages and sometimes crashes due to
> > > > slow NFS performance.
> > >
> > > Could you be more specific about the crash failure mode?
> > >
> > > > 3. File servers that randomly crash with "Fatal trap 12: page fault
> > > > while in kernel mode"
> > >
> > > Could you make sure you're running with at least the latest 5.3 patch
> > > level on the server, which includes some NFS server stability fixes,
> > > and also look at sliding to the head of 5-STABLE?  There are a number
> > > of performance and stability improvements that may be relevant there.
> > >
> > > Could you provide serial console output of the full panic message, trap
> > > details, compile the kernel with KDB+DDB, and include a full stack
> > > trace? I'm happy to try to help debug these problems.
> > >
> > > > 4. With soft updates enabled during FSCK the fileserver will freeze
> > > > with all NFS processs in the "snaplck" state. We disabled soft
> > > > updates because of this.
> > >
> > > If it's possible to do get some more information, it would be quite
> > > helpful.  In particular, could you compile the server box with
> > > DDB+KDB+BREAK_TO_DEBUGGER, breka into the serial debugger when it
> > > appears wedged, and put the contents of "show lockedvnods", "ps", and
> > > "trace <pid>" of any processes listed in "show lockedvnods" output,
> > > that would be great.  A crash dump would also be very helpful.  For
> > > some hints on the information that is necessary here, take a look at
> > > the handbook chapter on kernel debugging and reporting kernel bugs, and
> > > my recent post to current@ diagnosing a similar bug.
> > >
> > > If you e-enable soft updates but leave bgfsck disabled, does that
> > > correct this stability problem?
> > >
> > > In any case, I'm happy to help try to figure out what's going on --
> > > some of the above information for stability and performance problems
> > > would be quite helpful in tracking it down.
> > >
> > > Robert N M Watson
>
> _______________________________________________
> freebsd-performance at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to
> "freebsd-performance-unsubscribe at freebsd.org"