Many processes stuck in zfs

Stefan Bethke stb at lassitu.de
Tue Mar 9 12:44:00 UTC 2010


Am 09.03.2010 um 13:29 schrieb Pawel Jakub Dawidek:

> On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:
>> Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan.  The processes never recover from that, and trying to reboot only gets the entire system stuck, without any console messages.  I can enter the debugger, and I have saved a couple of dumps.
>> 
>> The situation seems to be triggered by zfs receive'ing snapshots from the sister machine (both synchronize their active ZFS filesystems to each other, using zfs send and zfs receive).  It appears it's the receiving causing trouble.
>> 
>> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
>> 
>> What should I be looking at to further diagnose?
> 
> What kind of hardware do you have there? There is 3-way deadlock I've a
> fix for which would be hard to trigger on single or dual core machines.

FreeBSD lokschuppen.zs64.net 8.0-STABLE FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010     root at lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  amd64
Copyrig
ht (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010
    root at lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz (2666.65-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x8e39d<SSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4081422336 (3892 MB)


> Feel free to try the fix:
> 
> 	http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch

I'll give it a shot on one of the two boxes.


Stefan

-- 
Stefan Bethke <stb at lassitu.de>   Fon +49 151 14070811





More information about the freebsd-stable mailing list