8.1 amd64 lockup (maybe zfs or disk related)
Greg Bonett
greg at bonett.org
Tue Feb 8 05:34:41 UTC 2011
Thank you for the help. I've implemented your
suggested /boot/loader.conf and /etc/sysctrl.conf tunings.
Unfortunately, after implementing these settings, I experienced another
lockup. And by "lockup" I mean, nothing responding (sshd, keyboard, num
lock) - had to reset.
I'm trying to isolate the cause of these lockups. I rebooted the system
and tried to simulate high load condition WITHOUT mounting my zfs pool.
First I ran many instances of "dd if=/dev/random of=/dev/null bs=4m" to
get high CPU load. The machine ran for many hours under this condition
without lockup. Then I added a few "dd if=/dev/adX of=/dev/null bs=4m"
to simulate some io load. After doing this it locked up immediately.
Thinking I had figured out the source of the problem, I rebooted and
tried to replicate this experience but was not able to. So far it has
been running for two hours with six "dd if=/dev/adX" commands (one for
each disk) and about a dozen "dd if=/dev/urandom" commands (to keep cpu
near 100%). I'll let it keep running and see if it locks again without
ever mounting zfs.
any ideas?
On Mon, 2011-02-07 at 00:55 -0800, Jeremy Chadwick wrote:
> On Sun, Feb 06, 2011 at 11:50:41PM -0800, Greg Bonett wrote:
> > Thanks for the response.
> > I have no tunings in /boot/loader.conf
> > according to http://wiki.freebsd.org/ZFSTuningGuide for amd64
> > "FreeBSD 7.2+ has improved kernel memory allocation strategy and no
> > tuning may be necessary on systems with more than 2 GB of RAM. "
> > I have 8GB of ram.
> > do you think this is wrong?
> >
> > Handbook recommends these (but says their test system has 1gb ram):
> > vm.kmem_size="330M"
> > vm.kmem_size_max="330M"
> > vfs.zfs.arc_max="40M"
> > vfs.zfs.vdev.cache.size="5M"
> >
> > what do you recommend?
>
> The Wiki is outdated, I'm sorry to say. Given that you have 8GB RAM, I
> would recommend these settings. Please note that some of these have
> become the defaults in 8.1 (depending on when your kernel was built and
> off of what source date), and in what will soon be 8.2:
>
> /boot/loader.conf :
>
> #
> # ZFS tuning parameters
> # NOTE: Be sure to see /etc/sysctl.conf for additional tunings
> #
>
> # Increase vm.kmem_size to allow for ZFS ARC to utilise more memory.
> vm.kmem_size="8192M"
> vfs.zfs.arc_max="6144M"
>
> # Disable ZFS prefetching
> # http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
> # Increases overall speed of ZFS, but when disk flushing/writes occur,
> # system is less responsive (due to extreme disk I/O).
> # NOTE: Systems with 8GB of RAM or more have prefetch enabled by default.
> vfs.zfs.prefetch_disable="1"
>
> # Disable UMA (uma(9)) for ZFS; amd64 was moved to exclusively use UMA
> # on 2010/05/24.
> # http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html
> vfs.zfs.zio.use_uma="0"
>
> # Decrease ZFS txg timeout value from 30 (default) to 5 seconds. This
> # should increase throughput and decrease the "bursty" stalls that
> # happen during immense I/O with ZFS.
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
> vfs.zfs.txg.timeout="5"
>
>
>
> /etc/sysctl.conf :
>
> #
> # ZFS tuning parameters
> # NOTE: Be sure to see /boot/loader.conf for additional tunings
> #
>
> # Increase number of vnodes; we've seen vfs.numvnodes reach 115,000
> # at times. Default max is a little over 200,000. Playing it safe...
> kern.maxvnodes=250000
>
> # Set TXG write limit to a lower threshold. This helps "level out"
> # the throughput rate (see "zpool iostat"). A value of 256MB works well
> # for systems with 4GB of RAM, while 1GB works well for us w/ 8GB on
> # disks which have 64MB cache.
> vfs.zfs.txg.write_limit_override=1073741824
>
>
> Be aware that the vfs.zfs.txg.write_limit_override tuning you see above
> may need to be adjusted for your system. It's up to you to figure out
> what works best in your environment.
>
> > I think the ad0: FAILURE - READ_DMA4 errors may be from a bad sata cable
> > (or rather, a 12in sata cable connecting a drive that is one inch away)
> > I'm ordering a new drive bay to improve this, but should a bad cable
> > cause lockups?
>
> Semantic point: it's READ_DMA48, not READ_DMA4. The "48" indicates
> 48-bit LBA addressing. There is no 4-bit LBA addressing mode.
>
> The term "lock up" is also too vague. If by "lock up" you mean "the
> system seems alive, hitting NumLock on the console keyboard toggles the
> LED", then the kernel is very likely spending too much of its time
> spinning in something (such as waiting for commands to return from the
> SATA controller, which could also indirectly be the controller waiting
> for the disk to respond to commands). If by "lock up" you mean "the
> system is literally hard locked, nothing responds, I have to hit
> physical Reset or power-cycle the box", then no, a bad cable should not
> be able to cause that.
>
>
> > #smartctl -a /dev/ad0
> >
> > === START OF INFORMATION SECTION ===
> > Model Family: Western Digital Caviar Green (Adv. Format) family
> > Device Model: WDC WD10EARS-00Y5B1
>
> First thing to note is that this is one of those new 4KB sector drives.
> I have no personal experience with them, but they have been talked about
> on the FreeBSD lists for quite some time, especially with regards to
> ZFS. The discussions involve performance. Just a FYI point.
>
> > SMART Attributes Data Structure revision number: 16
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> > UPDATED WHEN_FAILED RAW_VALUE
> > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
> > 3 Spin_Up_Time 0x0027 121 121 021 Pre-fail Always - 6933
> > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 30
> > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
> > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
> > 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2664
> > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
> > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
> > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 28
> > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 27
> > 193 Load_Cycle_Count 0x0032 135 135 000 Old_age Always - 196151
> > 194 Temperature_Celsius 0x0022 125 114 000 Old_age Always - 22
> > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
> > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
> > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
> > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
> > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
> >
> > SMART Error Log Version: 1
> > No Errors Logged
> >
> > SMART Self-test log structure revision number 1
> > Num Test_Description Status Remaining
> > LifeTime(hours) LBA_of_first_error
> > # 1 Short offline Completed without error 00% 1536
>
> Your disk looks "almost" fine. There are no indicators of bad blocks or
> CRC errors (which indicate bad SATA cables or physical PCB problems on
> the disk) -- that's the good part.
>
> The bad part: Attribute 193. Your disk is literally "load cycling"
> (which is somewhat equivalent to a power cycle; I'd rather not get into
> explaining what it is, but it's not good) on a regular basis. This
> problem with certain models of Western Digital disks has been discussed
> on the FreeBSD lists before. There have been statements made by users
> that Western Digital has indirectly acknowledged this problem, and fixed
> it in a later drive firmware revision. Please note that in some cases
> WD did not increment/change the firmware revision string in their fix,
> so you can't rely on that to determine anything.
>
> Would this behaviour cause READ_DMAxx and WRITE_DMAxx errors?
> Absolutely, no doubt about it.
>
> My recommendations: talk to Western Digital Technical Support and explain
> the problem, point them to this thread, and get a fixed/upgraded
> firmware from them. If they do not acknowledge the problem or you get
> stonewalled, I recommend replacing the drive entirely with a different
> model (I highly recommend the Caviar Black drives, which do not have
> this problem).
>
> If they give you a replacement firmware, you'll probably need a DOS boot
> disk to accomplish this, and need to make sure your BIOS does not have
> AHCI mode enabled (DOS won't find the disk). You can always re-enable
> AHCI after the upgrade. If you don't have a DOS boot disk, you'll need
> to explain to Western Digital that you need them to give you a bootable
> ISO that can allow you to perform the upgrade.
>
> If you need me to dig up mailing lists posts about this problem I can do
> so, but it will take me some time. The discussions might have been for
> a non-4K-sector Green drive as well, but it doesn't matter, the problem
> is known at this point.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, USA |
> | Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list