ZFS lockup in "zfs" state
Andrew Hill
lists at thefrog.net
Sun May 18 07:11:47 UTC 2008
> The following patch, published some time ago by pjd helped me:
> http://mbsd.msk.ru/dist/zfs_lockup.diff
>
> 100+ days of uptime of heavily loaded machines and no problems so far.
>
> Hope it would help.
I applied this patch with some modifications to fix up the file names
as they seem to have moved
from
- src/sys/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
- src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
- src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
to
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
(and pointed the kernel configuration file, MASSHOSTING_7_64, to my
own kernel config)
buildworld and buildkernel succeeded without error, but when i
installed the new kernel and rebooted i got the following output
(the important point being the failure to load zfs on the 8th line)
May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1992-2008 The
FreeBSD Project.
May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1979, 1980, 1983,
1986, 1988, 1989, 1991, 1992, 1993, 1994
May 17 17:02:06 <0.2> gutter kernel: The Regents of the University of
California. All rights reserved.
May 17 17:02:06 <0.2> gutter kernel: FreeBSD is a registered trademark
of The FreeBSD Foundation.
May 17 17:02:06 <0.2> gutter kernel: FreeBSD 7.0-STABLE #6: Sat May 17
16:39:32 EST 2008
May 17 17:02:06 <0.2> gutter kernel: root at gutter.thefrog.net:/usr/obj/
usr/src/sys/GUTTER
May 17 17:02:06 <0.2> gutter kernel: link_elf_obj: symbol kproc_exit
undefined
May 17 17:02:06 <0.2> gutter kernel: KLD file zfs.ko - could not
finalize loading
May 17 17:02:06 <0.2> gutter kernel: Timecounter "i8254" frequency
1193182 Hz quality 0
May 17 17:02:06 <0.2> gutter kernel: CPU: AMD Athlon(tm) 64 Processor
3200+ (2010.31-MHz K8-class CPU)
May 17 17:02:06 <0.2> gutter kernel: Origin = "AuthenticAMD" Id =
0x10ff0 Stepping = 0
May 17 17:02:06 <0.2> gutter kernel:
Features
=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
May 17 17:02:06 <0.2> gutter kernel: AMD
Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
May 17 17:02:06 <0.2> gutter kernel: AMD Features2=0x1<LAHF>
May 17 17:02:06 <0.2> gutter kernel: usable memory = 2137882624 (2038
MB)
May 17 17:02:06 <0.2> gutter kernel: avail memory = 2060988416 (1965
MB)
May 17 17:02:06 <0.2> gutter kernel: ACPI APIC Table: <Nvidia AWRDACPI>
May 17 17:02:06 <0.2> gutter kernel: ioapic0 <Version 1.1> irqs 0-23
on motherboard
<snip>
May 17 17:02:06 <0.2> gutter kernel: ad0: 238475MB <Hitachi
HDS722525VLAT80 V36OA60A> at ata0-master UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad2: 238475MB <WDC
WD2500PB-98FBA0 15.05R15> at ata1-master UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad3: 152627MB <Seagate ST3160812A
3.AAE> at ata1-slave UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad4: 476940MB <Seagate
ST3500320AS SD15> at ata2-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad6: 715404MB <Seagate
ST3750330AS SD15> at ata3-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad8: 305245MB <Seagate
ST3320620AS 3.AAK> at ata4-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad10: 305245MB <Seagate
ST3320620AS 3.AAE> at ata5-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad12: 305245MB <Seagate
ST3320620AS 3.AAE> at ata6-master SATA150
May 17 17:02:06 <0.2> gutter kernel: Trying to mount root from
zfs:tank/root
May 17 17:02:06 <0.2> gutter kernel:
May 17 17:02:06 <0.2> gutter kernel: Manual root filesystem
specification:
May 17 17:02:06 <0.2> gutter kernel: <fstype>:<device> Mount <device>
using filesystem <fstype>
May 17 17:02:06 <0.2> gutter kernel: eg. ufs:da0s1a
May 17 17:02:06 <0.2> gutter kernel: ? List valid
disk boot devices
May 17 17:02:06 <0.2> gutter kernel: <empty line> Abort manual
input
May 17 17:02:06 <0.2> gutter kernel:
May 17 17:02:06 <0.2> gutter kernel: mountroot>
at this point, since zfs has not been loaded, obviously i could not
get it to mount root from zfs:tank/root, and resorted to a backup ufs
root to put my old kernel back in place
i'm not sure if there is more output available than just the "could
not finalize loading", if so please let me know where to look and i'd
love to re-test this patch if it'll provide more information
right now, i'm getting uptimes in the order of days before everything
locks up, i assume its related to this bug, though i'm also getting
the following output when it locks up
ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650
ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007
ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938
ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631
ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650
ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007
ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938
ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631
ad0: FAILURE - WRITE_DMA timed out LBA=234920650
ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007
ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938
typically repeated for a number of different LBA values before the
system panics. I don't know if this is more likely to be related to
the cause of the lockups (e.g. faulty hardware/driver) or if its an
effect of the lockup (e.g. waiting on a deadlocked thread)... from
what i've found searching mailing lists, this kind of error seems to
turn up with faulty hardware/drivers so i guess it could just be that
zfs exposes the faults because its using the hardware differently to
my previous ufs setup...
in terms of my specific setup, i have 2gb ram, i'm running from up-to-
date -STABLE source (apart from my attempt to apply the aforementioned
patch), i'm running an amd64 kernel, and my /boot/loader.conf looks
like this:
vm.kmem_size_max="1610612736"
vm.kmem_size="1610612736"
zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
vfs.zfs.prefetch_disable="1"
vfs.zfs.arc_max="838860800"
the last line was an attempt to reduce the amount of arc cache in the
kernel in case it was having trouble locating memory blocks for other
things (as the default value had it at 1.2gb) but adding that
parameter doesn't seem to have had any effect
anyway, any info toward resolving this would be greatly appreciated,
and otherwise let me know what further info i can provide to help
track down the problem
Andrew
More information about the freebsd-fs
mailing list