Delete a directory, crash the system

Polytropon freebsd at
Sun Jul 28 05:54:47 UTC 2013

And here, kids, you can see the strength of open source
operating system: You can see _why_ something happens. :-)

On Sat, 27 Jul 2013 20:35:09 +0100, Frank Leonhardt wrote:
> On 27/07/2013 19:57, David Noel wrote:
> >> So the system panics in ufs_rmdir(). Maybe the filesystem is
> >> corrupt? Have you tried to fsck(8) it manually?
> > fsck worked, though I had to boot from a USB image because I couldn't
> > get into single user.. for some odd reason.
> >
> >> Even if the filesystem is corrupt, ufs_rmdir() shouldn't
> >> panic(), IMHO, but fail gracefully. Hmmm...
> > Yeah, I was pretty surprised. I think I tried it like 3 times to be
> > sure... and yeah, each time... kaboom! Who'd have thought. Do I just
> > post this to the mailing list and hope some benevolent developer
> > stumbles upon it and takes it upon him/herself to "fix" this, or where
> > do I find the FreeBSD Suggestion Box? I guess I should file a Problem
> > Report and see what happens from there.
> >
> I was going to raise an issue when the discussion had died down to a 
> concensus. I also don't think it's reasonable for the kernel to bomb 
> when it encounters corruption on a disk.
> If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around 
> line 2791 change:
>          if (dp->i_effnlink < 3)
>                  panic("ufs_dirrem: Bad link count %d on parent",
>                      dp->i_effnlink);
> To
>          if (dp->i_effnlink < 3) {
>                  error = EINVAL;
>                  goto out;
>          }
> The ufs_link() call has a similar issue.
> I can't see why my mod will break anything, but there's always 
> unintended consequences.

One of the core policies usually is to stop _any_ action that
had failed due to a "reason that cannot be" and make sure it
won't get worse. This can be seen for example in fsck's behaviour:
If there is a massive file system error that cannot be repaired
without further intervention that _could_ destroy data or make
its retrieval harder or impossible, the operator will be requested
to make the decision. There are options to automate this process,
but on the other hand, "always assume 'yes'" can then be a risk,
as it could prevent recovery. My assumtion is that the developers
chose a similar approach here: "We found a situation that should
not be possible, so we stop the system for messing up the file
system even more." This carries the attitude of not "hiding a
problem for the sake of convenience" by "being silent and going
back to the usual work". Of course it is debatable if this is the
right decision in _this_ particular case.

> By returning invalid argument, any code above 
> it should already be handling that condition although the user will be 
> scratching their head wondering what's wrong with it.

By determining the inode number and using the fsdb tool "internal
data" about inodes can be examined. Will it also show something
that's basically impossible? :-)

> Returning ENOENT 
> or EACCES or ENOTDIR may be better ("No such directory", "Access denied" 
> or "Not a valid directory").

Depends on the applying definition of those errors.

> The trouble is that it's tricky to test properly without finding a good 
> way to corrupt the link count :-)

There is a _simple_ way to do this, and I have even mentioned it.
Use the fsdb program and manipulate the inode "manually". Make
sure that you actually understand that _what_ you are doing there
is creating severe file system inconsistency errors. :-)

Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...

More information about the freebsd-questions mailing list