ffs_alloc panic patch
Ken Smith
kensmith at cse.Buffalo.EDU
Fri Aug 27 12:36:07 PDT 2004
On Fri, Aug 27, 2004 at 09:52:45PM +0400, Pavel Merdine wrote:
> Panic is VERY undesirable situation. And I'm in doubt why those people
> who wrote ffs like panics so devotedly:
>
> # grep -c "panic" ffs_alloc.c ffs_softdep.c
> ffs_alloc.c:37
> ffs_softdep.c:108
>
> I think such things are not acceptable in production environment. Why
> those functions cannot just return a failure state and leave system
> working?
Actually it's checks like this and calls to panic that make the system
acceptable in a production environment.
A couple of examples:
- Suppose the code is checking a reference counter, and that
counter has become zero for a file object that the kernel
believes is still in use. This should never happen,
it is an indication that somewhere else there was a programming
bug. Furthermore, that other piece of the system where the
bug lies may have started to use pieces of that file object
for *other* purposes. If you just continue on pretending
nothing happened you wind up with filesystem corruption,
what's on the disk is not necessarily correct. Better to
have the machine crash and reboot than write bad data to
the disk.
- Suppose you have a disk drive that's dying and now what you
write to it isn't necessarily what you read back because it's
dying. Again, much better to panic the machine than to continue
on pretending nothing is wrong. You would not likely have the
ffs code panic the machine for data inside of files for this
sort of situation but if the machine reads data in from the
data structures on the disk that keep track of what files are
inside of which directories, who owns those files, what the
permissions are, what disk blocks the files are actually
sitting on (this is generally known as "metadata") then the
ffs code will typically panic the machine.
- Suppose you as a sys-admin make a mistake, and somehow manage
to set up two disk partitions that partially overlap (don't
laugh, I've seen it happen...). Here you again wind up in a
situation where the filesystem data structures on the disk can
become corrupted. Typically at some point the ffs code will
recognize that the metadata is incorrect and again a panic is
better than trying to carry on pretending nothing is wrong.
None of these things should happen. But they *can* happen and not all
of them are "system bugs" - the second example is out of anyone's control
and the third example is "pilot error". The consequences of not panic-ing
in these situations is having corrupted data on the disks.
--
Ken Smith
- From there to here, from here to | kensmith at cse.buffalo.edu
there, funny things are everywhere. |
- Theodore Geisel |
More information about the freebsd-stable
mailing list