File remove problem

Mon Dec 3 02:11:49 PST 2007

On Sun, 2 Dec 2007, Don Lewis wrote:

> On  3 Dec, Bruce Evans wrote:
>> On Sun, 2 Dec 2007, Don Lewis wrote:
>>
>>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
>>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
>>> also looks like it should be a reasonable optimization for read-only
>>> file systems that should eliminate unnecessary work at the lower levels
>>> of the code.
>>
>> But I let these happen and discard IN_ATIME marks if fs_ronly.  I
>> thought that the optimization went the other way -- unconditionally
>> setting the marks was very efficient, and discarding them in ufs_itimes()
>> was efficient too.  I think this is still true now with larger locking
>> overheads, and the marks should be discarded later in the MNT_NOATIME
>> case too.  It is expected that the marks are set much more often than
>> they are looked at by ufs_itimes(), since most calls to ufs_itimes()
>> are in close() and read() is much more common than close().
>
> ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking
> MNT_RDONLY there as well is free.  Setting and clearing the mark will
> consume a few instruction cycles, dirty a cache line, and increase main
> memory write-back traffic, though the expense is likely to be small.

The check can also avoid the new vnode locking for useless settings of
IN_ATIME.  But what locks the MNT flags?  Nothing directly I think.
Here we must not care if we read a stale value, and ufs_itimes() must
not care if the value changed just after we read it.

> Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set
> on a downgrade to read-only seems to be the right thing to do.

Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes
to the inode after MNT_RDONLY is set early in the r/w to r/o transition.
Checking MNT_RDONLY gives the more correct behaviour of not having to
discard even IN_ATIME settings that were made before the transition
began.

I don't understand how unmount (apparently) works so well without
setting MNT_RDONLY to prevent further writes like the transition does.

>> ufs_itimes()
>> is also called in stat() but I think that is less common than close()
>> (except for some tree walks).  WIth non-delayed marking, ufs_itimes()
>> would still have to check fs_ronly, and the only gain would be that
>> it could then skip checking the marks except as an invariants check.
>> But it can gain like that even with delayed setting -- just ignore any
>> old marks while fs_ronly (except as an invariants check), but clear them
>> at mount or unmount time so that there shouldn't be any.
>
> I think that setting the marks when the file system is read-only causes
> the syncer to do extra work.  I think that ffs_sync() still gets called
> if the file system is read-only, and if it encounters any inodes with
> marks set, it calls ffs_syncvnode() on them.

I think VOP_SYNC() actually isn't called on r/o file systems.  Callers
check MNT_RDONLY or possibly dirty block list pointers.  msdosfs had
a buf that would have caused panics if msdosfs_sync() were called on
an fs that had ever been mounted r/w but is currently r/o.

>>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
>>> check, appears to be protected by the MNT_RDONLY check in
>>> vfs_mark_atime().
>>
>> Thanks, I had forgotten about that.  In vfs_mark_atime(), there is much
>> more efficiency to be gained by not setting marks that will be discarded,
>> since it takes a VOP to set them and many file systems don't support
>> this setting.  However, it is hard for vfs_mark_atime() to know when the
>> mark will be discarded without calling the fs:
>>
>> - it already doesn't know which fs's support it
>> - it should be checking fs_ronly for ffs
>
> I think that MNT_RDONLY is correct here.  We want to stop new atime
> updates as soon as the downgrade starts, just like we stop new
> user-initiated writes.

Right.  Same as for normal read accesses or delayed killing of IN_ACCESS
in ufs_itimes().

>> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
>>
>> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
>> too.  Upper layers set the MNT flags before giving VOP_MOUNT() a chance
>> to adjust the marks.  This is automatically safe in one direction only
>> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
>> changes), and always bad for strict invariants.
>
> Maybe a reasonable way to handle this would be to set the
> flags before calling VOP_MOUNT() when they are being changed from 0 to
> 1, and clear them after calling VOP_MOUNT() when changing them from 1 to
> 0. Adding explicit locking sounds painful ...

This already happens for MNT_RDONLY.  ffs_mount() has dead code which
obfuscates this and other things by setting all the generic flags
again.  It gets the timing of the setting of MNT_RDONLY backwards by
delaying the setting until the end of the transition from r/w to r/o,
but this has no effect since MNT_RDONLY is set throughout the transition.
It only gets the timing of the clearing of MNT_RDONLY right by delaying
it until the end of the transition from r/o to r/w.

Some flags have the wrong sense for this to be right.  E.g., early
clearing of MNT_ASYNC is safe, but early setting of it is not.  tegge
fixed some races from this.

Bruce