kern/177536: [zfs] zfs livelock (deadlock) with high write-to-disk load

Dennis Glatting dg at pki2.com
Sat Apr 27 23:07:40 UTC 2013


A common thread I have noticed under 9.x is systems with multiple 
populated processor sockets, ZFS will occasionally hang but not 9.x 
systems with only one populated socket. Also, those same multi-populated 
systems running 8.4 do not hang however on my four socket, 16-core systems 
the number of cores is shrunk to two sockets somewhere in the boot 
process.

I'm running Opteron Bulldozers and PileDrivers (6200/6300 series).




On Sat, 27 Apr 2013, Martin Birgmeier wrote:

> The following reply was made to PR kern/177536; it has been noted by GNATS.
>
> From: Martin Birgmeier <Martin.Birgmeier at aon.at>
> To: bug-followup at FreeBSD.org, Andriy Gapon <avg at FreeBSD.org>
> Cc:
> Subject: Re: kern/177536: [zfs] zfs livelock (deadlock) with high write-to-disk
> load
> Date: Sat, 27 Apr 2013 11:40:16 +0200
>
> So it happened again... same system (9.1.0 release), except that the
> kernel has been recompiled with options DDB, KDB, and STACK.
>
> I ran procstat -kk -a (twice). Output can be found in
> http://members.aon.at/xyzzy/procstat.-kk.-a.1.gz and
> http://members.aon.at/xyzzy/procstat.-kk.-a.2.gz, respectively. I also
> started kgdb in script(1), executing "thread apply all bt" in it. Output
> can be found in http://members.aon.at/xyzzy/kgdb.thread.apply.all.bt.gz.
>
> More info on the "test case":
> - As described in the initial report, / is a UFS GPT partition on one of
> 6 SATA disks. There exists a zpool "hal.1" on one (other) GPT partition
> on each of these disks.
> - VirtualBox is run by a user whose home dir is on one of the zfs file
> systems.
> - First, a big write load to another zfs file system of the same zpool
> was started (160 GB copy from a remote machine).
> - Then, 3 VBoxHeadless instances were started.
> ==> livelock on zfs
> - procstat run twice, then script + kgdb
> - copied output to another machine
> - shutdown the hung machine (via "shutdown -p")
> ==> "some processes would not die"
> ==> "syncing disks" executes until all zeros, then the system just sits
> there with continuous disk activity (obviously from zfs), shutdown does
> not proceed further
> - hard reset
> - on reboot: UFS file system check (no errors), ZFS starts fine and
> seems mostly unaffected (except of course that the 160 GB copy is truncated)
>
> An analysis would be appreciated, and also a hint whether I should
> switch to stable/9 instead.
>
> Regards,
>
> Martin
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>


More information about the freebsd-fs mailing list