kern/178997: Heavy disk I/O may hang system
Klaus Weber
fbsd-bugs-2013-1 at unix-admin.de
Mon Jun 17 00:20:05 UTC 2013
The following reply was made to PR kern/178997; it has been noted by GNATS.
From: Klaus Weber <fbsd-bugs-2013-1 at unix-admin.de>
To: Bruce Evans <brde at optusnet.com.au>
Cc: Klaus Weber <fbsd-bugs-2013-1 at unix-admin.de>,
freebsd-gnats-submit at FreeBSD.org
Subject: Re: kern/178997: Heavy disk I/O may hang system
Date: Mon, 17 Jun 2013 02:17:16 +0200
On Mon, Jun 10, 2013 at 11:00:29AM +1000, Bruce Evans wrote:
> On Mon, 10 Jun 2013, Klaus Weber wrote:
> >On Tue, Jun 04, 2013 at 07:09:59AM +1000, Bruce Evans wrote:
> >>On Fri, 31 May 2013, Klaus Weber wrote:
>
> This thread is getting very long, and I will only summarize a couple
> of things that I found last week here. Maybe more later.
>
> o Everything seems to be working as well as intended (not very well)
> except in bufdaemon and friends. Perhaps it is already fixed there.
> I forgot to check which version of FreeBSD you are using. You may
> be missing some important fixes. There were some by kib@ a few
> months ago, and some by jeff@ after this thread started.
I have now tested with both an up-to-date 9-STABLE as well as a
10-CURRENT kernel from ftp.freebsd.org (8.6.2016):
FreeBSD filepile 9.1-STABLE FreeBSD 9.1-STABLE #27 r251798M: Sun Jun
16 16:19:18 CEST 2013 root at filepile:/usr/obj/usr/src/sys/FILEPILE
amd64
FreeBSD filepile 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Sat Jun 8
22:10:23 UTC 2013 root at snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC
amd64
The bad news is that I can still reproduce the hangs reliably, with
the same methods as before. I do have the feeling that the machine
survives the re-write load a bit longer, but its just a gut feeling -
I have no means to quantify this.
The "dirtybuf" sysctl values have changed their default values; I
believe you were involved in the discussion about how they are
computed that lead to changing them. They are now:
vfs.lodirtybuffers: 13251
vfs.hidirtybuffers: 26502
vfs.dirtybufthresh: 23851
The 10-CURRENT kernel, shows a message on the console when
the hang occurs (but see below for a twist):
g_vfs_done():da0p1[WRITE(offset=<x>, length=<y>)]error = 11
The message is printed to the console about once a second when the
hang occurs, until I hard-reset the server. <x> keeps changing,
<y> is mostly 65536, sometimes 131072 (like, one screenful of 64k's
then one 128k line, etc.)
Error=11 seems to be EDEADLK, so it looks like a -CURRENT kernel does
detect a deadlock sitation whereas the -STABLE kernel doesn't. (I have
also re-checked the -CURRENT from 5.5.2013 which I briefly tested in
my initial report also shows these messages. My apologies for not
noticing this earlier).
And now there is an additional point where things get really weird:
The "g_vfs_done()..." messages _only_ appear when the system hangs as
a result of bonnie++'s rewriting load. If I repeat the same test with
the test program you provided earlier, the system will still hang, but
without the console output.
(I have repeated the tests 10 times to make sure I wasn't just
imagining this; 5 times with bonnie++ (message appears every time),
5 times with your test program (message did not appear a single
time)).
I'm not really sure what difference between these two programs causes
this. Bonnie++ writes the file immediately before starting to rewrite
it, while your program works on pre-fabricated files - maybe this it.
Anyway: Is there a way to find out which resource(s) are involved in
the EDEADLK-situation? Please keep in mind that I cannot enter the
debugger via the console, and panicing the machine leaves me with a
dead keyboard, which seems to make any useful debugging pretty hard.
(If you believe that a working debugger is required to make progress
on this, let me know and I will try to get a serial console working
somehow.)
>>[observations regarding seek behavior]
I did not get to do much other testing this weekend (hanging the
system is pretty time-consuming, as after the hard-reset I have to
wait for the root partition to re-mirror).
I have thought about how I could observe the seek behavior of my disks
when testing, but I have come to the conclusion that I cannot do this
really. My RAID controller presents the array as a single disk to the
OS, and its on-board RAM allows it to cache writes and perform
read-aheads without the kernel knowing, so neither reading from nor
writing to two blocks far apart will neccessarily result in a physical
head move.
I think I'll concentrate next on finding out where/why the hangs are
occuring. Getting rid of the hangs and hard resets will hopefully
speed up testing cycles.
Klaus
More information about the freebsd-bugs
mailing list