ffs_alloc panic patch

Ken Smith kensmith at cse.Buffalo.EDU
Fri Aug 27 12:36:07 PDT 2004


On Fri, Aug 27, 2004 at 09:52:45PM +0400, Pavel Merdine wrote:

> Panic is VERY undesirable situation. And I'm in doubt why those people
> who wrote ffs like panics so devotedly:
> 
> # grep -c "panic" ffs_alloc.c ffs_softdep.c
> ffs_alloc.c:37
> ffs_softdep.c:108
> 
> I  think such things are not acceptable in production environment. Why
> those  functions  cannot  just return a failure state and leave system
> working?

Actually it's checks like this and calls to panic that make the system
acceptable in a production environment.

A couple of examples:

	- Suppose the code is checking a reference counter, and that
	  counter has become zero for a file object that the kernel
	  believes is still in use.  This should never happen,
	  it is an indication that somewhere else there was a programming
	  bug.  Furthermore, that other piece of the system where the
	  bug lies may have started to use pieces of that file object
	  for *other* purposes.  If you just continue on pretending
	  nothing happened you wind up with filesystem corruption,
	  what's on the disk is not necessarily correct.  Better to
	  have the machine crash and reboot than write bad data to
	  the disk.

	- Suppose you have a disk drive that's dying and now what you
	  write to it isn't necessarily what you read back because it's
	  dying.  Again, much better to panic the machine than to continue
	  on pretending nothing is wrong.  You would not likely have the
	  ffs code panic the machine for data inside of files for this
	  sort of situation but if the machine reads data in from the
	  data structures on the disk that keep track of what files are
	  inside of which directories, who owns those files, what the
	  permissions are, what disk blocks the files are actually
	  sitting on (this is generally known as "metadata") then the
	  ffs code will typically panic the machine.

	- Suppose you as a sys-admin make a mistake, and somehow manage
	  to set up two disk partitions that partially overlap (don't
	  laugh, I've seen it happen...).  Here you again wind up in a
	  situation where the filesystem data structures on the disk can
	  become corrupted.  Typically at some point the ffs code will
	  recognize that the metadata is incorrect and again a panic is
	  better than trying to carry on pretending nothing is wrong.

None of these things should happen.  But they *can* happen and not all
of them are "system bugs" - the second example is out of anyone's control
and the third example is "pilot error".  The consequences of not panic-ing
in these situations is having corrupted data on the disks.

-- 
						Ken Smith
- From there to here, from here to      |       kensmith at cse.buffalo.edu
  there, funny things are everywhere.   |
                      - Theodore Geisel |


More information about the freebsd-stable mailing list