Delete a directory, crash the system

Sun Jul 28 10:05:56 UTC 2013

On 28/07/2013 06:54, Polytropon wrote:
> And here, kids, you can see the strength of open source
> operating system: You can see _why_ something happens. :-)
Too true!

> On Sat, 27 Jul 2013 20:35:09 +0100, Frank Leonhardt wrote:
>> On 27/07/2013 19:57, David Noel wrote:
>>>> So the system panics in ufs_rmdir(). Maybe the filesystem is
>>>> corrupt? Have you tried to fsck(8) it manually?
>>> fsck worked, though I had to boot from a USB image because I couldn't
>>> get into single user.. for some odd reason.
>>>
>>>> Even if the filesystem is corrupt, ufs_rmdir() shouldn't
>>>> panic(), IMHO, but fail gracefully. Hmmm...
>>> Yeah, I was pretty surprised. I think I tried it like 3 times to be
>>> sure... and yeah, each time... kaboom! Who'd have thought. Do I just
>>> post this to the mailing list and hope some benevolent developer
>>> stumbles upon it and takes it upon him/herself to "fix" this, or where
>>> do I find the FreeBSD Suggestion Box? I guess I should file a Problem
>>> Report and see what happens from there.
>>>
>> I was going to raise an issue when the discussion had died down to a
>> concensus. I also don't think it's reasonable for the kernel to bomb
>> when it encounters corruption on a disk.
>>
>> If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around
>> line 2791 change:
>>
>>           if (dp->i_effnlink < 3)
>>                   panic("ufs_dirrem: Bad link count %d on parent",
>>                       dp->i_effnlink);
>>
>> To
>>
>>           if (dp->i_effnlink < 3) {
>>                   error = EINVAL;
>>                   goto out;
>>           }
>>
>> The ufs_link() call has a similar issue.
>>
>> I can't see why my mod will break anything, but there's always
>> unintended consequences.
> One of the core policies usually is to stop _any_ action that
> had failed due to a "reason that cannot be" and make sure it
> won't get worse. This can be seen for example in fsck's behaviour:
> If there is a massive file system error that cannot be repaired
> without further intervention that _could_ destroy data or make
> its retrieval harder or impossible, the operator will be requested
> to make the decision. There are options to automate this process,
> but on the other hand, "always assume 'yes'" can then be a risk,
> as it could prevent recovery. My assumtion is that the developers
> chose a similar approach here: "We found a situation that should
> not be possible, so we stop the system for messing up the file
> system even more." This carries the attitude of not "hiding a
> problem for the sake of convenience" by "being silent and going
> back to the usual work". Of course it is debatable if this is the
> right decision in _this_ particular case.
>
>
>

The problem I have with this is the assumption that the inode was at 
fault. I said this was the most likely, but it's not the absolute 
reason. At the risk of repeating, it's the /effective/ link count (in 
the vnode) that's out of line here, not the inode count.

If the inode was wrong it could be down to minor FS corruption; an 
interrupted directory creation or deletion would do the trick. The vnode 
could go wrong for all sorts of reasons, probably associated with a race 
during the directory removal, which is not an atomic operation by any 
means. See "The Design of the UNIX operating system" p 5.16.1, Bach, 
Prentice-Hall, 1986.

My guess is that we're looking at an old debugging pragma here, put in 
to cope with a race going wrong if the code wasn't quite right (note 
that the function has since been renamed but the message not updated).

You're right about stopping on internal errors (corruption to the kernel 
data structures in this case) but this case is indeed debatable. On the 
one hand, now the system is stable (i.e. we can probably trust rmdir 
code after all this time), the most likely cause is inode corruption 
polluting the vnode. On the other hand the pragma may be useful if 
people are tinkering with the kernel and you get even more opportunities 
for a race with (say) SMP.

I don't expect the kernel to panic on a user-land I/O error, or anything 
else that's expected or recoverable - and a wonky FS meets these 
criteria in my book. David was lucky to find this - I tend to run 
FreeBSD on servers, not laptops, and I'd never have seen this server 
panic "live" and therefore not been able to discover the cause very 
easily. That's worrying.

So it boils down to:

a) Leave is is, as it can detect when the kernel has trashed its vnode 
table; or

b) It's probably caused by "expected" FS corruption, so handle it 
gracefully.

Incidentally, if you look at the code you'll see this is only heuristic 
check, and a weak one at that. Most of the time it WILL NOT pick up the 
case where the parent directory's link is missing. As far as I can tell 
it will go on to unlink the target successfully, with no ill effects. If 
this situation really did lead to catastrophe (as suggested by the use 
of a panic) then the check used ought to be a lot more reliable! As it 
is, removing it entirely except for debug kernels, is a third option.

Regards, Frank.