FFS writes to read-only mount

Fri Jul 6 01:39:27 UTC 2007

Here's a little more information about the problem. 
(g_vfs_done():mirror/gmroots1f[WRITE(offset=1541668864, 
length=16384)]error = 1)

I am able to reproduce this problem relatively easily by reinstalling 
our system (it often occurs < 10 installations (thank you expect :-)).  
As long as I don't change the kit too much, the offset is always the same.

I discovered that running fsck makes the problem go away.  For example, 
if I run "fsck_ufs /" (the partition on which it's happening) the 
messages no longer appear.  The next logical question is, is the buffer 
written to disk, or just tossed?  So I added a test to ad_strategy to 
call kdb_enter if the bio's bio_pblkno matched that for the buffer above 
(I divided by 512, the sector size).  The breakpoint wasn't entered.  So 
it appears the buffer is just being tossed when fsck is run.

fsck_ufs /
** /dev/mirror/gmroots1f (NO WRITE)
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2859 files, 81957 used, 934739 free (131 frags, 116826 blocks, 0.0% 
fragmentation)

The other point is, that means it didn't read it from the disk either, 
even though ffs_reload was called.  It's almost as if this buffer should 
have been discarded in the first instance, not written to disk.

Can anyone tell me where I should look to see if this buffer has been 
removed before the call to ffs_reload?  I'm assuming there a list(s) I 
can traverse and look for matching block device offset.  I see a few 
lists but I'm concerned that I don't find it but I haven't looked in the 
right place.

Thanks,
Dave

ext Bruce Evans wrote:
> On Tue, 3 Jul 2007, Gary Palmer wrote:
>
>> On Wed, Jul 04, 2007 at 11:08:36AM +1000, Bruce Evans wrote:
>
>>> In some non-current versions of FreeBSD, I have debugging code in
>>> ffs_update() that complains about attempts to update inodes on 
>>> read-only
>>> file systems.  Such attempts certainly occur, due to historical 
>>> mistakes.
>>> They are supposed to be handled (starting sometime in 4.x) by silently
>>> ignoring the problem and clearing the IN_MODIFIED flag and related 
>>> flags
>>> so that the update is not retried later.  I don't know of any cases 
>>> where
>>> this doesn't work.
>>
>> Does silently clearing that flag mean data could be lost?  Or are these
>> just async metadata updates and all the file content is properly
>> flushed prior to the FS going RO?
>
> I think at most some timestamps were lost, and then maybe only for the
> short time while transitioning from rw to ro.  Timestamps related only
> to that transition period _should_ be lost, since it isn't worth
> restarting the transition to pick up changes to timestamps alone.  Now
> it looks like the hack in ufs_itimes() to write out timestamps related
> to before the transition (but not yet finalized) never worked and has
> been lost.  Maybe I just don't understand the code and everything now
> works without hacks.  I think what should happen for MNT_UPDATE is:
>
> o first call vn_start_write().  Hopefully this prevents all writes from
>   userland during the transition.  Writes from the kernel must be 
> permitted
>   so as to sync old writes from userland.
> o sync all old writes using something a synchronous ffs_sync(), but more
>   forceful so as not to forget syncing IN_LAZYMOD inodes.
> o set MNT_RDONLY in the vnode for the mount point.  I think this alone 
> was
>   supposed to prevent writes from userland.  It works poorly for this 
> since
>   it also prevents some writes from the kernel.  E.g., in ufs_itimes() it
>   now prevents ufs_itimes() changing anything, so if timestamps haven't
>   already been finalized and flushed then there is a bug.  Some old 
> versions
>   of ufs_timestamp() starting with ufs_vnops.c 1.182 handled this problem
>   badly by setting IN_MODIFIED before checking any readonly flag, but I
>   think this did less than nothing since these versions proceeded to 
> check
>   MNT_RDONLY and make null changes to the timestamps if that flag is set;
>   thus they broke assertions obout no writes to read-only file systems
>   without actually syncing old timestamps.
> o for ffs, set fs_ronly in the superblock to prevent all writes via the
>   file system.  ffs_update() checks this, and this is supposed to permit
>   the kernel to update timestamps between the setting of MNT_RDONLY and
>   the setting of fs_ronly, but this never worked right.
>
> There are related problems with IN_LAZYMOD and IN_LAZYACCESS.  IN_LAZYMOD
> inodes are only fully synced by going through ufs_reclaim() (for ffs).
> I think this doesn't happen early enough (if at all) to work for the
> rw -> ro transition (it works for unmount()).  This problem is moot
> in -current since IN_LAZYMOD is only used for cdevs and there are no
> cdevs on ffs.  (I also use it for atime updates but don't test it much
> since I also use -noatime for almost all file systems).  Problems with
> IN_LAZYACCESS are similar, but are more likely to be all fixed since
> they are serious if they occur.  Writes of even atimes by the kernel
> must be prevented while taking snapshots.  This is handled by delaying
> the atime updates; all other writes are supposed to be prevented by
> something like vn_start_write().
>
> Concerning fsck not working after "mount -u -o ro /": fsck generally
> doesn't work on mounted file systems, even if the mount is ro.  It
> works for the root file system after plain mount only because
> ffs_mountfs() has an extra g_access() call to make it work.  This
> call is missing for mount -u.
>
> Bruce
>