File remove problem
brde at optusnet.com.au
Mon Dec 3 02:11:49 PST 2007
On Sun, 2 Dec 2007, Don Lewis wrote:
> On 3 Dec, Bruce Evans wrote:
>> On Sun, 2 Dec 2007, Don Lewis wrote:
>>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
>>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This
>>> also looks like it should be a reasonable optimization for read-only
>>> file systems that should eliminate unnecessary work at the lower levels
>>> of the code.
>> But I let these happen and discard IN_ATIME marks if fs_ronly. I
>> thought that the optimization went the other way -- unconditionally
>> setting the marks was very efficient, and discarding them in ufs_itimes()
>> was efficient too. I think this is still true now with larger locking
>> overheads, and the marks should be discarded later in the MNT_NOATIME
>> case too. It is expected that the marks are set much more often than
>> they are looked at by ufs_itimes(), since most calls to ufs_itimes()
>> are in close() and read() is much more common than close().
> ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking
> MNT_RDONLY there as well is free. Setting and clearing the mark will
> consume a few instruction cycles, dirty a cache line, and increase main
> memory write-back traffic, though the expense is likely to be small.
The check can also avoid the new vnode locking for useless settings of
IN_ATIME. But what locks the MNT flags? Nothing directly I think.
Here we must not care if we read a stale value, and ufs_itimes() must
not care if the value changed just after we read it.
> Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set
> on a downgrade to read-only seems to be the right thing to do.
Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes
to the inode after MNT_RDONLY is set early in the r/w to r/o transition.
Checking MNT_RDONLY gives the more correct behaviour of not having to
discard even IN_ATIME settings that were made before the transition
I don't understand how unmount (apparently) works so well without
setting MNT_RDONLY to prevent further writes like the transition does.
>> is also called in stat() but I think that is less common than close()
>> (except for some tree walks). WIth non-delayed marking, ufs_itimes()
>> would still have to check fs_ronly, and the only gain would be that
>> it could then skip checking the marks except as an invariants check.
>> But it can gain like that even with delayed setting -- just ignore any
>> old marks while fs_ronly (except as an invariants check), but clear them
>> at mount or unmount time so that there shouldn't be any.
> I think that setting the marks when the file system is read-only causes
> the syncer to do extra work. I think that ffs_sync() still gets called
> if the file system is read-only, and if it encounters any inodes with
> marks set, it calls ffs_syncvnode() on them.
I think VOP_SYNC() actually isn't called on r/o file systems. Callers
check MNT_RDONLY or possibly dirty block list pointers. msdosfs had
a buf that would have caused panics if msdosfs_sync() were called on
an fs that had ever been mounted r/w but is currently r/o.
>>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
>>> check, appears to be protected by the MNT_RDONLY check in
>> Thanks, I had forgotten about that. In vfs_mark_atime(), there is much
>> more efficiency to be gained by not setting marks that will be discarded,
>> since it takes a VOP to set them and many file systems don't support
>> this setting. However, it is hard for vfs_mark_atime() to know when the
>> mark will be discarded without calling the fs:
>> - it already doesn't know which fs's support it
>> - it should be checking fs_ronly for ffs
> I think that MNT_RDONLY is correct here. We want to stop new atime
> updates as soon as the downgrade starts, just like we stop new
> user-initiated writes.
Right. Same as for normal read accesses or delayed killing of IN_ACCESS
>> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
>> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
>> too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance
>> to adjust the marks. This is automatically safe in one direction only
>> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
>> changes), and always bad for strict invariants.
> Maybe a reasonable way to handle this would be to set the
> flags before calling VOP_MOUNT() when they are being changed from 0 to
> 1, and clear them after calling VOP_MOUNT() when changing them from 1 to
> 0. Adding explicit locking sounds painful ...
This already happens for MNT_RDONLY. ffs_mount() has dead code which
obfuscates this and other things by setting all the generic flags
again. It gets the timing of the setting of MNT_RDONLY backwards by
delaying the setting until the end of the transition from r/w to r/o,
but this has no effect since MNT_RDONLY is set throughout the transition.
It only gets the timing of the clearing of MNT_RDONLY right by delaying
it until the end of the transition from r/o to r/w.
Some flags have the wrong sense for this to be right. E.g., early
clearing of MNT_ASYNC is safe, but early setting of it is not. tegge
fixed some races from this.
More information about the freebsd-fs