8.1 amd64 lockup (maybe zfs or disk related)

Tue Feb 8 05:34:41 UTC 2011

Thank you for the help.  I've implemented your
suggested /boot/loader.conf and /etc/sysctrl.conf tunings.
Unfortunately, after implementing these settings, I experienced another
lockup.  And by "lockup" I mean, nothing responding (sshd, keyboard, num
lock) - had to reset. 

I'm trying to isolate the cause of these lockups.  I rebooted the system
and tried to simulate high load condition WITHOUT mounting my zfs pool.
First I ran many instances of "dd if=/dev/random of=/dev/null bs=4m" to
get high CPU load.  The machine ran for many hours under this condition
without lockup.  Then I added a few "dd if=/dev/adX of=/dev/null bs=4m"
to simulate some io load.  After doing this it locked up immediately.  
Thinking I had figured out the source of the problem, I rebooted and
tried to replicate this experience but was not able to.  So far it has
been running for two hours with six "dd if=/dev/adX" commands (one for
each disk) and about a dozen "dd if=/dev/urandom" commands (to keep cpu
near 100%).  I'll let it keep running and see if it locks again without
ever mounting zfs.

any ideas?

On Mon, 2011-02-07 at 00:55 -0800, Jeremy Chadwick wrote:
> On Sun, Feb 06, 2011 at 11:50:41PM -0800, Greg Bonett wrote:
> > Thanks for the response.
> > I have no tunings in /boot/loader.conf
> > according to http://wiki.freebsd.org/ZFSTuningGuide for amd64
> > "FreeBSD 7.2+ has improved kernel memory allocation strategy and no
> > tuning may be necessary on systems with more than 2 GB of RAM. "
> > I have 8GB of ram.
> > do you think this is wrong?
> >   
> > Handbook recommends these (but says their test system has 1gb ram):
> > vm.kmem_size="330M"
> > vm.kmem_size_max="330M"
> > vfs.zfs.arc_max="40M"
> > vfs.zfs.vdev.cache.size="5M"
> > 
> > what do you recommend?
> 
> The Wiki is outdated, I'm sorry to say.  Given that you have 8GB RAM, I
> would recommend these settings.  Please note that some of these have
> become the defaults in 8.1 (depending on when your kernel was built and
> off of what source date), and in what will soon be 8.2:
> 
> /boot/loader.conf :
> 
> #
> # ZFS tuning parameters
> # NOTE: Be sure to see /etc/sysctl.conf for additional tunings
> #
> 
> # Increase vm.kmem_size to allow for ZFS ARC to utilise more memory.
> vm.kmem_size="8192M"
> vfs.zfs.arc_max="6144M"
> 
> # Disable ZFS prefetching
> # http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
> # Increases overall speed of ZFS, but when disk flushing/writes occur,
> # system is less responsive (due to extreme disk I/O).
> # NOTE: Systems with 8GB of RAM or more have prefetch enabled by default.
> vfs.zfs.prefetch_disable="1"
> 
> # Disable UMA (uma(9)) for ZFS; amd64 was moved to exclusively use UMA
> # on 2010/05/24.
> # http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html
> vfs.zfs.zio.use_uma="0"
> 
> # Decrease ZFS txg timeout value from 30 (default) to 5 seconds.  This
> # should increase throughput and decrease the "bursty" stalls that
> # happen during immense I/O with ZFS.
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
> vfs.zfs.txg.timeout="5"
> 
> 
> 
> /etc/sysctl.conf :
> 
> #
> # ZFS tuning parameters
> # NOTE: Be sure to see /boot/loader.conf for additional tunings
> #
> 
> # Increase number of vnodes; we've seen vfs.numvnodes reach 115,000
> # at times.  Default max is a little over 200,000.  Playing it safe...
> kern.maxvnodes=250000
> 
> # Set TXG write limit to a lower threshold.  This helps "level out"
> # the throughput rate (see "zpool iostat").  A value of 256MB works well
> # for systems with 4GB of RAM, while 1GB works well for us w/ 8GB on
> # disks which have 64MB cache.
> vfs.zfs.txg.write_limit_override=1073741824
> 
> 
> Be aware that the vfs.zfs.txg.write_limit_override tuning you see above
> may need to be adjusted for your system.  It's up to you to figure out
> what works best in your environment.
> 
> > I think the ad0: FAILURE - READ_DMA4 errors may be from a bad sata cable
> > (or rather, a 12in sata cable connecting a drive that is one inch away)
> > I'm ordering a new drive bay to improve this, but should a bad cable
> > cause lockups?  
> 
> Semantic point: it's READ_DMA48, not READ_DMA4.  The "48" indicates
> 48-bit LBA addressing.  There is no 4-bit LBA addressing mode.
> 
> The term "lock up" is also too vague.  If by "lock up" you mean "the
> system seems alive, hitting NumLock on the console keyboard toggles the
> LED", then the kernel is very likely spending too much of its time
> spinning in something (such as waiting for commands to return from the
> SATA controller, which could also indirectly be the controller waiting
> for the disk to respond to commands).  If by "lock up" you mean "the
> system is literally hard locked, nothing responds, I have to hit
> physical Reset or power-cycle the box", then no, a bad cable should not
> be able to cause that.
> 
> 
> > #smartctl -a /dev/ad0 
> > 
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Caviar Green (Adv. Format) family
> > Device Model:     WDC WD10EARS-00Y5B1
> 
> First thing to note is that this is one of those new 4KB sector drives.
> I have no personal experience with them, but they have been talked about
> on the FreeBSD lists for quite some time, especially with regards to
> ZFS.  The discussions involve performance.  Just a FYI point.
> 
> > SMART Attributes Data Structure revision number: 16
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> > UPDATED  WHEN_FAILED RAW_VALUE
> >   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always -       0
> >   3 Spin_Up_Time            0x0027   121   121   021    Pre-fail  Always -       6933
> >   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always -       30
> >   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always -       0
> >   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always -       0
> >   9 Power_On_Hours          0x0032   097   097   000    Old_age   Always -       2664
> >  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always -       0
> >  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always -       0
> >  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always -       28
> > 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always -       27
> > 193 Load_Cycle_Count        0x0032   135   135   000    Old_age   Always -       196151
> > 194 Temperature_Celsius     0x0022   125   114   000    Old_age   Always -       22
> > 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always -       0
> > 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always -       0
> > 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline -      0
> > 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always -       0
> > 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline -      0
> > 
> > SMART Error Log Version: 1
> > No Errors Logged
> > 
> > SMART Self-test log structure revision number 1
> > Num  Test_Description    Status                  Remaining
> > LifeTime(hours)  LBA_of_first_error
> > # 1  Short offline       Completed without error       00%      1536
> 
> Your disk looks "almost" fine.  There are no indicators of bad blocks or
> CRC errors (which indicate bad SATA cables or physical PCB problems on
> the disk) -- that's the good part.
> 
> The bad part: Attribute 193.  Your disk is literally "load cycling"
> (which is somewhat equivalent to a power cycle; I'd rather not get into
> explaining what it is, but it's not good) on a regular basis.  This
> problem with certain models of Western Digital disks has been discussed
> on the FreeBSD lists before.  There have been statements made by users
> that Western Digital has indirectly acknowledged this problem, and fixed
> it in a later drive firmware revision.  Please note that in some cases
> WD did not increment/change the firmware revision string in their fix,
> so you can't rely on that to determine anything.
> 
> Would this behaviour cause READ_DMAxx and WRITE_DMAxx errors?
> Absolutely, no doubt about it.
> 
> My recommendations: talk to Western Digital Technical Support and explain
> the problem, point them to this thread, and get a fixed/upgraded
> firmware from them.  If they do not acknowledge the problem or you get
> stonewalled, I recommend replacing the drive entirely with a different
> model (I highly recommend the Caviar Black drives, which do not have
> this problem).
> 
> If they give you a replacement firmware, you'll probably need a DOS boot
> disk to accomplish this, and need to make sure your BIOS does not have
> AHCI mode enabled (DOS won't find the disk).  You can always re-enable
> AHCI after the upgrade.  If you don't have a DOS boot disk, you'll need
> to explain to Western Digital that you need them to give you a bootable
> ISO that can allow you to perform the upgrade.
> 
> If you need me to dig up mailing lists posts about this problem I can do
> so, but it will take me some time.  The discussions might have been for
> a non-4K-sector Green drive as well, but it doesn't matter, the problem
> is known at this point.
> 
> -- 
> | Jeremy Chadwick                                   jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |