kern/177536: zfs livelock (deadlock) with high write-to-disk load

Mon Apr 1 07:50:01 UTC 2013

>Number:         177536
>Category:       kern
>Synopsis:       zfs livelock (deadlock) with high write-to-disk load
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 01 07:50:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Martin Birgmeier
>Release:        9.1
>Organization:
MBi at home
>Environment:
FreeBSD hal.xyzzy 9.1-RELEASE FreeBSD 9.1-RELEASE #1: Fri Jan  4 12:37:44 CET 2013     root at v904.xyzzy:/usr/obj/.../hal/z/SRC/FreeBSD/release/9.1.0/sys/XYZZY_SMP  amd64
>Description:
I am using a program whose purpose is to efficiently copy disk partitions to image files stored on zfs. Multiple backups in time are kept using snapshots of the file system on which these image files reside.

The program uses mmap(2) to map an image file into memory, then reads the disk partition to be backup up in fixed-size blocks (default 128k). For each block, it first compares the already stored data with the new one using memcmp(3), and does a write using memcpy(3) only if there is a mismatch. In this way, only the blocks which have changed are really written, keeping the zfs snapshots small.

The files mapped in this way are typically several 100GB in size.

Problem: Every now and then, the above operation results in a live lock (deadlock) of the zfs server. Specifically today, the following could be observed:
- the program described above did not proceed any further after having read/written several dozen GB of data
- an 'ls -l' in the directory holding the image file was stuck
- there was constant low-level disk activity, but obviously no progress regarding writing data

Some environmental information:
- The pool consists of 6 partitions on 6 SATA disks in raidz2 configuration
- The pool has been created on 2010-10-22 and still has version 14
- / is an UFS partition on one of the SATA disks
- The server has 16 GB RAM
- There are no indications at all of any hardware disk failures
- other zfs file systems on the same pool could still be read/written to
- the machines responded quite normally to many requests including remote login etc.
- a VBoxHeadless session whose host user data was exclusively on the UFS root, with the vbox disk pointing to another raw partition on one of the disks was stuck
- From previous experience I knew that I had to hard reset the machine, since a shutdown would result in zfs sync (?) continuing forever after the UFS sync had already finished (by the way, it was a bad idea not to do a shutdown today, as I destroyed my .zhistory residing on the root UFS partition that way)
- Even if the problem does not occur, when the program described above has finished comparing/copying all blocks, several more minutes pass before the program finally finishes. During that time there is high disk activity.
- Under FreeBSD 8.2, the problem did not occur, and neither the symptoms described in the previous point (long time to program finish).

>How-To-Repeat:
The problem cannot be easily reproduced; however, from past experience it seems that the following increases the chances of it happening:
- a high write load
- after that write load has already lasted for a while, another rather high read load (possibly on another zfs file system of the same zpool)
- maybe activity on an unrelated partition (non-ZFS) on the disks which also carry the partitions used for the zpool

Unfortunately, most of this is just guesses and hopefully not too misleading.

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted: