8.1 amd64 lockup (maybe zfs or disk related)

Mon Feb 7 08:55:40 UTC 2011

On Sun, Feb 06, 2011 at 11:50:41PM -0800, Greg Bonett wrote:
> Thanks for the response.
> I have no tunings in /boot/loader.conf
> according to http://wiki.freebsd.org/ZFSTuningGuide for amd64
> "FreeBSD 7.2+ has improved kernel memory allocation strategy and no
> tuning may be necessary on systems with more than 2 GB of RAM. "
> I have 8GB of ram.
> do you think this is wrong?
>   
> Handbook recommends these (but says their test system has 1gb ram):
> vm.kmem_size="330M"
> vm.kmem_size_max="330M"
> vfs.zfs.arc_max="40M"
> vfs.zfs.vdev.cache.size="5M"
> 
> what do you recommend?

The Wiki is outdated, I'm sorry to say.  Given that you have 8GB RAM, I
would recommend these settings.  Please note that some of these have
become the defaults in 8.1 (depending on when your kernel was built and
off of what source date), and in what will soon be 8.2:

/boot/loader.conf :

#
# ZFS tuning parameters
# NOTE: Be sure to see /etc/sysctl.conf for additional tunings
#

# Increase vm.kmem_size to allow for ZFS ARC to utilise more memory.
vm.kmem_size="8192M"
vfs.zfs.arc_max="6144M"

# Disable ZFS prefetching
# http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
# Increases overall speed of ZFS, but when disk flushing/writes occur,
# system is less responsive (due to extreme disk I/O).
# NOTE: Systems with 8GB of RAM or more have prefetch enabled by default.
vfs.zfs.prefetch_disable="1"

# Disable UMA (uma(9)) for ZFS; amd64 was moved to exclusively use UMA
# on 2010/05/24.
# http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html
vfs.zfs.zio.use_uma="0"

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds.  This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
vfs.zfs.txg.timeout="5"

/etc/sysctl.conf :

#
# ZFS tuning parameters
# NOTE: Be sure to see /boot/loader.conf for additional tunings
#

# Increase number of vnodes; we've seen vfs.numvnodes reach 115,000
# at times.  Default max is a little over 200,000.  Playing it safe...
kern.maxvnodes=250000

# Set TXG write limit to a lower threshold.  This helps "level out"
# the throughput rate (see "zpool iostat").  A value of 256MB works well
# for systems with 4GB of RAM, while 1GB works well for us w/ 8GB on
# disks which have 64MB cache.
vfs.zfs.txg.write_limit_override=1073741824

Be aware that the vfs.zfs.txg.write_limit_override tuning you see above
may need to be adjusted for your system.  It's up to you to figure out
what works best in your environment.

> I think the ad0: FAILURE - READ_DMA4 errors may be from a bad sata cable
> (or rather, a 12in sata cable connecting a drive that is one inch away)
> I'm ordering a new drive bay to improve this, but should a bad cable
> cause lockups?  

Semantic point: it's READ_DMA48, not READ_DMA4.  The "48" indicates
48-bit LBA addressing.  There is no 4-bit LBA addressing mode.

The term "lock up" is also too vague.  If by "lock up" you mean "the
system seems alive, hitting NumLock on the console keyboard toggles the
LED", then the kernel is very likely spending too much of its time
spinning in something (such as waiting for commands to return from the
SATA controller, which could also indirectly be the controller waiting
for the disk to respond to commands).  If by "lock up" you mean "the
system is literally hard locked, nothing responds, I have to hit
physical Reset or power-cycle the box", then no, a bad cable should not
be able to cause that.

> #smartctl -a /dev/ad0 
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Caviar Green (Adv. Format) family
> Device Model:     WDC WD10EARS-00Y5B1

First thing to note is that this is one of those new 4KB sector drives.
I have no personal experience with them, but they have been talked about
on the FreeBSD lists for quite some time, especially with regards to
ZFS.  The discussions involve performance.  Just a FYI point.

> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always -       0
>   3 Spin_Up_Time            0x0027   121   121   021    Pre-fail  Always -       6933
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always -       30
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always -       0
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always -       0
>   9 Power_On_Hours          0x0032   097   097   000    Old_age   Always -       2664
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always -       28
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always -       27
> 193 Load_Cycle_Count        0x0032   135   135   000    Old_age   Always -       196151
> 194 Temperature_Celsius     0x0022   125   114   000    Old_age   Always -       22
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline -      0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline -      0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%      1536

Your disk looks "almost" fine.  There are no indicators of bad blocks or
CRC errors (which indicate bad SATA cables or physical PCB problems on
the disk) -- that's the good part.

The bad part: Attribute 193.  Your disk is literally "load cycling"
(which is somewhat equivalent to a power cycle; I'd rather not get into
explaining what it is, but it's not good) on a regular basis.  This
problem with certain models of Western Digital disks has been discussed
on the FreeBSD lists before.  There have been statements made by users
that Western Digital has indirectly acknowledged this problem, and fixed
it in a later drive firmware revision.  Please note that in some cases
WD did not increment/change the firmware revision string in their fix,
so you can't rely on that to determine anything.

Would this behaviour cause READ_DMAxx and WRITE_DMAxx errors?
Absolutely, no doubt about it.

My recommendations: talk to Western Digital Technical Support and explain
the problem, point them to this thread, and get a fixed/upgraded
firmware from them.  If they do not acknowledge the problem or you get
stonewalled, I recommend replacing the drive entirely with a different
model (I highly recommend the Caviar Black drives, which do not have
this problem).

If they give you a replacement firmware, you'll probably need a DOS boot
disk to accomplish this, and need to make sure your BIOS does not have
AHCI mode enabled (DOS won't find the disk).  You can always re-enable
AHCI after the upgrade.  If you don't have a DOS boot disk, you'll need
to explain to Western Digital that you need them to give you a bootable
ISO that can allow you to perform the upgrade.

If you need me to dig up mailing lists posts about this problem I can do
so, but it will take me some time.  The discussions might have been for
a non-4K-sector Green drive as well, but it doesn't matter, the problem
is known at this point.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |