g_vfs_done() failures on 6.2-RC1

Kris Kennaway kris at obsecurity.org
Wed Dec 13 00:21:26 PST 2006


On Wed, Dec 13, 2006 at 07:09:15PM +1100, Jan Mikkelsen wrote:
> Scott Long wrote:
> >Jan Mikkelsen wrote:
> >
> >>- Daichi Goto's unionfs-p16 has been applied.
> >>- The Areca driver is 1.20.00.12 from the Areca website.
> >>- sym(4) patch (see PR/89550), but no sym controller present.
> >>- SMP + FAST_IPSEC + SUIDDIR + device crypto.
> >>
> >>So:  I've seen this problem on a few machines under heavy I/O load, with 
> >>ataraid and with arcmsr.  I've seen others report similar problems, but 
> >>I've seen no resolution.  Does anyone have any idea what the problem is? 
> >>Has anyone else seen similar problems?  Where to from here?
> >>
> >>Thanks,
> >>
> >
> >You mention that you are using a driver from the Areca website.  Have
> >you tried using the stock driver that comes with FreeBSD?  I don't know
> >if it will be better or not, but I was planning on doing a refresh of
> >the stock driver, and I'd hate to introduce instability that wasn't there 
> >before.
> 
> I haven't run it recently.  I can roll back to the stock driver and see 
> whether I see it again.  However, I can't always reproduce the problem, so 
> I probably can't prove the absence of the problem.
> 
> I mentioned that I have seen similar problems on machines with ataraid, 
> like this:
> 
> DOH! ata_alloc_composite failed! (x5)
> FAILURE - out of memory in ata_raid_init_request (x6)
> g_vfs_done():ar0s3f[WRITE(offset=113324673024, length=2048)]error = 5
> g_vfs_done():ar0s3f[WRITE(offset=113325062144, length=2048)]error = 5
> g_vfs_done():ar0s3f[WRITE(offset=113325127680, length=2048)]error = 5
> g_vfs_done():ar0s3f[WRITE(offset=113325242368, length=2048)]error = 5
> g_vfs_done():ar0s3f[WRITE(offset=113325256704, length=2048)]error = 5
> g_vfs_done():ar0s3f[WRITE(offset=113325275136, length=2048)]error = 5
> 
> However, looking at this again, I'm not sure that the problem is identical 
> anymore because the offset seems to be within the partition rather than 
> just plain wrong (assuming the units of the offset message are bytes).  
> These messages are from an HP DL145G1 with two SATA drives and ataraid.
> 
> The workload that caused these messages is very similar:  Heavy I/O during 
> multiple concurrent removes of deep trees on a filesystem with softupdates, 
> system needs a reboot to get back on track.

Yes, it looks like a different problem:

a) It's a different driver (ataraid vs areca).  The g_vfs_done message
is a generic error, it means "the driver I was writing to returned EIO
in response to this attempted write".  The reasons why the error
occurred will depend on the driver and hardware.

b) As you say, the error messages are sensible in the ataraid case but
not in the areca case.

c) There is a previous error message which causes the g_vfs_done
errors as secondary effects.  Your bug here is whatever causes the
"DOH!" in ataraid, so that's what you should follow up (separately).

Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061213/18def6a6/attachment.pgp


More information about the freebsd-stable mailing list