Many processes stuck in zfs

Borja Marcos borjam at sarenet.es
Thu Mar 11 08:54:49 UTC 2010


On Mar 11, 2010, at 8:45 AM, Alexander Leidinger wrote:

> Quoting Pawel Jakub Dawidek <pjd at FreeBSD.org> (from Wed, 10 Mar 2010 18:31:43 +0100):
> 
> There is a 4th possibility, if you can rule out everything else: bugs in the CPU. I stumbled upon this with ZFS (but UFS was exposing the problem much faster). The problem in my case was that the BIOS was not recognizing the CPU and as such was not uploading microcode updates.
> 
> Borja, can you confirm that the CPU is correctly announced in FreeBSD (just look at "dmesg | grep CPU:" output, if it tells you it is a AMD or Intel XXX CPU it is correctly detected by the BIOS)?

A CPU bug? Weird. Very.

Let me explain the whole history of this.

We are using ZFS to maintain a couple of servers in an active/passive arrangement. At 30 second intervals we create a snapshot on the master server and send it to the slave. Actually I prefer this scheme to drbd-style arrangements, but that's another story ;)

We started our tests and soon ran into problems: deadlocked filesystem. At one point I remember that the deadlock affected UFS as well, not only ZFS. I mean, having both ZFS and UFS, the system also lost access to the UFS filesystems when this happened.

Looking at the hours when it happened, it turned out to be one or two of these events: periodic scripts running (which, among other things, traverse the whole filesystem) and/or a backup being made with Bacula. Either way, there seemed to be a problem: read activity on a dataset on which I was receiving a snapshot at the same time could lead to a deadlock. I am sure I have never tried to receive two snapshots simultaneously, etc. The replicating program guaratees it.

As the servers had to be rolled into production, and such tests with real servers can be quite time consuming, I set up a couple of FreeBSD virtual machines, using VMWare Fusion (version 2 then, now version 3) on a Macbook (Macbook 4,1 Intel Core2Duo, 2.1 GHz) and tried to reproduce it.

To reproduce it, I set up a "master" machine, with /usr/src and /usr/obj on a dataset (pool/src), replicating it at 30 second intervals to another virtual machine, the slave. On the slave, I launch "tar" in an infinite loop, so that the contents of the replicated dataser (pool/src) is copied to another dataset (pool/thecopy).

With that running, and, remember, there are replications at 30 second intervals (longer if a replication takes a long time, of course) I run a make buildworld on the master machine. The destination soon gets deadlocked.

I have tried to fiddle with the virtual machine, for example, trying to offer a single or dual core CPU, and there's no difference. With dual cores it *seems* to deadlock earlier, but I'm not sure. For the latest test results I've posted, I was using a single core CPU. 

The original machines on which I detected the problem (problem I have subsequently reproduced successfully on virtual machines running on VMWare Fusion) are Dell PowerEdge 2950, and this is the CPU description:



Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz (2496.25-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x1067a  Stepping = 10
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x40ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 8589934592 (8192 MB)
avail memory = 8250003456 (7867 MB)
ACPI APIC Table: <DELL   PE_SC3  >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 8 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <DELL PE_SC3> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900


The virtual machine (VMWare Fusion 3.0.0, Macbook, Mac OS X 10.6.2) reports this:
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU     T8100  @ 2.10GHz (2116.62-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
  Features=0xfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS>
  Features2=0x80082201<SSE3,SSSE3,CX16,SSE4.1,<b31>>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 1153433600 (1100 MB)
avail memory = 1090441216 (1039 MB)
ACPI APIC Table: <PTLTD          APIC  >
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <INTEL 440BX> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0






In order to compare to Solaris, I installed a virtual machine running Solaris 10 as well, and used it as a target for the replication. The same test didn't deadlock and it seemed to work like a charm.

Sometimes I've tried to run more than one "tar" job in parallel instead of just one. It just makes it deadlock earlier, no other difference.

Any more tests I can do?






Borja.




More information about the freebsd-fs mailing list