kern/106030: panic while rebooting with a dead disk
mjacob at freebsd.org
mjacob at freebsd.org
Wed Nov 29 15:30:22 PST 2006
The following reply was made to PR kern/106030; it has been noted by GNATS.
From: mjacob at freebsd.org
To: Robert Watson <rwatson at freebsd.org>
Cc: bug-follouwp at freebsd.org
Subject: Re: kern/106030: panic while rebooting with a dead disk
Date: Wed, 29 Nov 2006 15:08:54 -0800 (PST)
> This is a panic on shutdown in the file system. All user processes have
> exited, and UFS is unable to sync cached data to disk, so there is no way to
> report the error to a user process.
Yes- but it is also true that this would happen at a time other than
reboot. In fact, I rebooted rather than try and run with a dead disk
mounted and much to my annoyance I *still* couldn't avoid a panic. My
only other choice would have been to do a 'reboot -n'. Bad in either
case.
>
> There are certainly situations where FreeBSD panics rather than tolerating
> invalid file system data, but I believe those problems are entirely at the
> file system layer. There is a kernel printf from GEOM, but the panic occurs
> in the buffer cache code, presumably when UFS discovers life sucks more than
> it thought. I'd like to see UFS grow more tolerant of this sort of thing,
> and simply lose the data rather than panicking.
Yes.
> That said, I think the more pressing issue is actually with FAT, since
> reliable server configurations frequently run UFS over RAID, but most FAT
> devices are not only not reliable, but also removeable, which we currently
> fail to tolerate at all when the FAT file system is mounted. A practice run
> on tolerating device removal for FAT would probably prepare us to address the
> UFS issues more competently, as well as shake out issues in VM, etc, that
> might arise. For example, I believe we currently fail rather poorly when
> paging in data from a failing swap device. Certainly there's no good way to
> get out of the situation, but I think we perform one of the less good bad
> ways.
Uhh- this conversation just took a rather bizaare twist. It's not just a
question of making UFS more fault tolerant- UFS is sort of a dead horse
by now and RAID may not help when it's a channel failure (e.g., fibre
channel or iSCSI). I'd rather see efforts put into ZFS (and fixing the
XFS port to actually work)- but that is besides the point. It's more of
a case to make sure that we don't panic when we don't have to. Now we do
too much.
But these are very good points- thanks for the review of my somewhat
botched bug report.
More information about the freebsd-bugs
mailing list