Panic in ffs_valloc (Was: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS!)

Fri Mar 1 18:00:54 UTC 2013

> Date: Fri, 1 Mar 2013 10:22:37 +0400
> From: Lev Serebryakov <lev at freebsd.org>
> To: Don Lewis <truckman at freebsd.org>
> Subject: Re: Panic in ffs_valloc (Was: Unexpected SU+J inconsistency AGAIN --
> Cc: freebsd-fs at freebsd.org, freebsd-current at freebsd.org
> 
> DL> The fact that the filesystem code called panic() indicates that the
> DL> filesystem was already corrupt by that point.  That's a likely reason
> DL> for fsck complaining about the unexpected SU+J inconsistency.
> 
> DL> Incorrect write ordering that allowed the filesystem to become
> DL> inconsistent because some pending writes were lost because of the panic
> DL> might not be necessary, but this might have allowed an earlier crash
> DL> where a full fsck was skipped to leave the filesystem in this state.
>   As far, as I understand, if this theory is right (file system
>  corruption which left unnoticed by "standard" fsck), it is bug in FFS
>  SU+J too, as it should not be corrupted by reordered writes (if
>  writes is properly reported as completed even if they were
>  reordered).

If the bitmaps are left corrupted (in particular if blocks are marked
free that are actually in use), then that panic can occur. Such a state
should never be possible when running with SU even if you have crashed
multiple times and restarted without running fsck.

To reduce the number of possible points of failure, I suggest that
you try running with just SU (i.e., turn off the SU+J jornalling).
you can do this with `tunefs -j disable /dev/fsdisk'. This will
turn off journalling, but not soft updates. You can verify this
by then running `tunefs -p /dev/fsdisk' to ensure that soft updates
are still enabled.

As you have already stated, the filesystem is fine with reordered
writes provided that they are not completed (iodone) until they are
well and truely on the disk.

> DL> This panic might also be a result of the bug fixed in 246877, but I have
> DL> my doubts about that.
>   It was not MFCed :(
> 
> --
> // Black Lion AKA Lev Serebryakov <lev at FreeBSD.org>

I will MFC 246876 and 246877 once they have been in head long enough
to have confidence that they will not cause trouble. That means at
least a month (well more than the two weeks they have presently been
there).

Note these changes only pass the barrier request down to the GEOM
layer. I don't know whether it actually makes it to the drive layer
and if it does whether the drive layer actually implements it. My
goal was to get the ball rolling.

	Kirk McKusick