kern/169480: [zfs] ZFS stalls on heavy I/O
Harry Coin
hgcoin at gmail.com
Sat Jan 26 05:50:01 UTC 2013
The following reply was made to PR kern/169480; it has been noted by GNATS.
From: Harry Coin <hgcoin at gmail.com>
To: bug-followup at FreeBSD.org, levent.serinol at mynet.com, jdc at koitsu.org
Cc:
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 23:49:23 -0600
I appreciate your effort very much.
One change from last time was I set failmode=continue from 'wait'.
There were no errors in zpool or other logs.
re 1,2: switching to bs=64k and /dev/zero as the input source, the dd
completes and the ls -l during the process causes no problems. Transfer
speed reported by dd of 183qMBytes/sec.
re 3: the gstat -I500ms shows each of the raid partitions hanging in at
about top speed of 50-ish MBps during the above dd. All disks perform
within a couple % of one another.
re 4: we'll stick with gstat.
re 5: I see we agree about that. Nothing happened during the scrub
other than the scrub. 14Mbps scrub speed is 1/6th of the minimum I'd
expect.
re 6: Tuning Is Evil. On the other hand, Crashing is Eviler. So I
tried Evil Tuning, with poor results as noted upstream. So, perhaps
tuning is indeed not only evil but a black hole for time as well.
Switching back to the 512 byte writes, I notice on gstat the writes are
zero for 4 secs or so, then a burst of activity (all partitions within a
few % of one another), all quiet, repeat. Then I throw in the ls -l...
and... it all works with no problems. Trying the dd with the big
blocksize and /dev/urandom.... and: gstat looks bursty as in the 512
byte writes. ls -l causes no changes, works. Trying the original
/dev/urandom and bs=512 and ... the gstat pattern does the usual bursty
thing, ls -l works well. Thinking it's about the gstat...
Doing top -S -s 1 .. nothing.
nas4free:~# vmstat -i
interrupt total rate
irq1: atkbd0 168 0
irq18: uhci2 1136 0
irq20: hpet0 620136 328
irq21: uhci0 ehci0 7764 4
irq256: hdac0 37 0
irq257: ahci0 1090 0
irq258: em0 5625 2
irq259: ahci1 849746 450
Total 1485702 787
So weird. It's not crashing. So, I tried to move a video from one
dataset to another using mv. Within moments all the writes gstat shows
have stopped, while there's a few K / s reads. And there it sits.
Here's top
last pid: 3932; load averages: 0.02, 0.04,
0.13 up 0+00:24:46 05:03:53
249 processes: 3 running, 229 sleeping, 17 waiting
CPU: 0.0% user, 0.0% nice, 1.2% system, 0.0% interrupt, 98.8% idle
Mem: 14M Active, 19M Inact, 953M Wired, 8256K Buf, 2246M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 32K CPU1 1 22:02 100.00%
idle{idle: cpu1}
11 root 155 ki31 0K 32K RUN 0 22:06 98.49%
idle{idle: cpu0}
13 root -8 - 0K 48K - 1 0:06 0.10%
geom{g_up}
5 root -8 - 0K 80K zio->i 0 0:05 0.10%
zfskern{txg_thread_enter}
0 root -16 0 0K 2624K sched 1 2:30 0.00%
kernel{swapper}
0 root -16 0 0K 2624K - 0 0:30 0.00%
kernel{zio_read_intr_1}
0 root -16 0 0K 2624K - 0 0:30 0.00%
kernel{zio_read_intr_0}
0 root -16 0 0K 2624K - 1 0:10 0.00%
kernel{zio_write_issue_}
0 root -16 0 0K 2624K - 1 0:10 0.00%
kernel{zio_write_issue_}
12 root -88 - 0K 272K WAIT 1 0:08 0.00%
intr{irq259: ahci1}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_2}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_3}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_6}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_1}
0 root -16 0 0K 2624K - 0 0:07 0.00%
kernel{zio_write_intr_0}
0 root -16 0 0K 2624K - 0 0:07 0.00%
kernel{zio_write_intr_4}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_7}
0 root -16 0 0K 2624K - 1 0:07 0.00%
kernel{zio_write_intr_5}
13 root -8 - 0K 48K - 1 0:07 0.00%
geom{g_down}
3919 root 20 0 9916K 1636K tx->tx 1 0:07 0.00% mv
20 root 16 - 0K 16K syncer 0 0:03 0.00% syncer
12 root -60 - 0K 272K WAIT 0 0:01 0.00%
intr{swi4: clock}
12 root -52 - 0K 272K WAIT 1 0:01 0.00%
intr{swi6: Giant task}
14 root -16 - 0K 16K - 0 0:01 0.00% yarrow
5 root -8 - 0K 80K arc_re 0 0:01 0.00%
zfskern{arc_reclaim_thre}
12 root -84 - 0K 272K WAIT 1 0:01 0.00%
intr{irq1: atkbd0}
0 root -16 0 0K 2624K - 1 0:00 0.00%
kernel{zio_write_issue_}
0 root -8 0 0K 2624K - 0 0:00 0.00%
kernel{zil_clean}
0 root -16 0 0K 2624K - 1 0:00 0.00%
kernel{zio_write_issue_}
Set failmode back to wait. Rebooted from the livecd and now repeating
the tests:
And they work properly.
So, with gstat running I once again try to move a video file from one
dataset to another. I watch a bunch of reads happen, the a burst of
writes, and that goes on for a couple of minutes. Then the writes stop
and the reads continue at 3Kbps according to gstat. The total memory
still has 2.5G free. There is no writing to the zpool. I notice that
one of the video files has moved, and the writing stopped when it was
trying to open the next one I'm guessing. I can sucessfully ^C out of
the mv, and notice a constant low speed read going on, 80-100KBps. I
try restarting the move, no change to the read pattern, no writes.
So, no memory explosion, ^C works, but I can't write, and gstat reports
a steady read low pace read for no obvious reason, just as it was when
the mv process was active.
So, a huger puzzle.
More information about the freebsd-fs
mailing list