File remove problem
Bruce Evans
brde at optusnet.com.au
Sat Dec 1 19:57:13 PST 2007
On Sat, 1 Dec 2007, Don Lewis wrote:
> On 2 Dec, Bruce Evans wrote:
>
>> Here is a non-hackish patch which explains why ignoring MNT_RDONLY in
>> the above or in ffs_mount() helps. It just fixes the confusion between
>> IN_MODIFIED and IN_CHANGE in critical places.
>>
>> % Index: ffs_softdep.c
[All settings here, but not yet in ufs_inactive() and ffs_truncate(),
which are critical, and in other places which might not be critical.]
>> Without this change, soft updates depends on IN_CHANGE being converted
>> to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen
>> when MNT_RDONLY is set. With soft updates, changes are often delayed
>> until sync time, and when the sync is for mount-update it is done after
>> setting MNT_RDONLY so the above doesn't work.
>
> ufs_itimes() should probably also be looking at fs_ronly instead of
> MNT_RDONLY, *but* all the paths leading from userland to ufs_itimes()
> would need to be checked to verify that they check MNT_RDONLY to prevent
> new file system write operations from happening while the remount is in
> progress.
Yes, that is probably why MNT_RDONLY is (ab)used now.
I found old (Y2002) private mail from mckusick that explains a previous
change in this area, a change that mostly avoided the problem but has
been lost:
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% Working file: ufs_vnops.c
% ----------------------------
% revision 1.182
% date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: +4 -5
% When downgrading a filesystem from read-write to read-only, operations
% involving file removal or file update were not always being fully
% committed to disk. The result was lost files or corrupted file data.
% This change ensures that the filesystem is properly synced to disk
% before the filesystem is down-graded.
%
% This delta also fixes a long standing bug in which a file open for
% reading has been unlinked. When the last open reference to the file
% is closed, the inode is reclaimed by the filesystem. Previously,
% if the filesystem had been down-graded to read-only, the inode could
% not be reclaimed, and thus was lost and had to be later recovered
% by fsck. With this change, such files are found at the time of the
% down-grade. Normally they will result in the filesystem down-grade
% failing with `device busy'. If a forcible down-grade is done, then
% the affected files will be revoked causing the inode to be released
% and the open file descriptors to begin failing on attempts to read.
%
% Submitted by: "Sam Leffler" <sam at errno.com>
% ----------------------------
%
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.181
% retrieving revision 1.182
% diff -u -2 -r1.181 -r1.182
% --- ufs_vnops.c 22 Nov 2001 15:33:12 -0000 1.181
% +++ ufs_vnops.c 15 Jan 2002 07:17:12 -0000 1.182
% @@ -37,5 +37,5 @@
% *
% * @(#)ufs_vnops.c 8.27 (Berkeley) 5/27/95
% - * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.181 2001/11/22 15:33:12 guido Exp $
% + * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.182 2002/01/15 07:17:12 mckusick Exp $
% */
%
% @@ -159,11 +159,10 @@
% if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
% return;
% + if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
% + ip->i_flag |= IN_LAZYMOD;
% + else
% + ip->i_flag |= IN_MODIFIED;
% if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% vfs_timestamp(&ts);
% - if ((vp->v_type == VBLK || vp->v_type == VCHR) &&
% - !DOINGSOFTDEP(vp))
% - ip->i_flag |= IN_LAZYMOD;
% - else
% - ip->i_flag |= IN_MODIFIED;
% if (ip->i_flag & IN_ACCESS) {
% ip->i_atime = ts.tv_sec;
This is in ufs_itimes(). Note that it moves the setting of the modified
flag before the check of MNT_RDONLY, so that when the wrong or incomplete
flags are set earlier and the wrong flags aren't converted to the
modified flag before MNT_RDONLY is set, then we only lose the timestamps
but not critical updates here.
% Index: ffs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v
% retrieving revision 1.73
% retrieving revision 1.74
% diff -u -2 -r1.73 -r1.74
% --- ffs_inode.c 13 Dec 2001 05:07:48 -0000 1.73
% +++ ffs_inode.c 15 Jan 2002 07:17:12 -0000 1.74
% @@ -32,5 +32,5 @@
% *
% * @(#)ffs_inode.c 8.13 (Berkeley) 4/21/95
% - * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.73 2001/12/13 05:07:48 mckusick Exp $
% + * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.74 2002/01/15 07:17:12 mckusick Exp $
% */
%
% @@ -88,7 +88,7 @@
% return (0);
% ip->i_flag &= ~(IN_LAZYMOD | IN_MODIFIED);
% - if (vp->v_mount->mnt_flag & MNT_RDONLY)
% - return (0);
% fs = ip->i_fs;
% + if (fs->fs_ronly)
% + return (0);
% /*
% * Ensure that uid and gid are correct. This is a temporary
This fixes loss of the critical updates a little later in ffs_update().
% @@ -153,4 +153,6 @@
% oip = VTOI(ovp);
% fs = oip->i_fs;
% + if (fs->fs_ronly)
% + panic("ffs_truncate: read-only filesystem");
% if (length < 0)
% return (EINVAL);
This is a sanity check in ffs_truncate(). I think all callers except the
one in ufs_inactive() automatically pass the check, since they are higher
level so they do a correct check of MNT_RDONLY. The call in ufs_inactive()
used to be unconditional (except for the (i_nlink == 0) condition of course).
In -current it is conditional on MNT_ONLY. kib's patch changes it to be
conditional on fs_ronly, but I hope it can become unconditional again --
it should be an error to reach ufs_inactive() with a partially deleted
file after syncing before changing fs_ronly to 0.
ffs_update() should panic instead of returning 0 when (fs->fs_ronly)
too, so that bugs get noticed.
mckusick's explanation says that "[fs_ronly is the only believable flag].
Thus the killing of IN_MODIFIED has to happen a few lines later [in]
ffs_update()".
It should be safe to blindly ignore all modification flags except IN_ACCESS
in ufs_itimes(), since ffs_update() will kill the completely invalid ones.
4.4BSD-Lite blindly ignored all modification flags in ITIMES(), and
checks the wrong read-only flag (MNT_RDONLY) in open code that duplicates
ITIMES() plus adds the wrong r/o check. When I converted ITIMES() to
ufs_itimes(), I centralized this wrong r/o check. This was mainly a
cleanup, but I think it fixes wrong setting of atimes (IN_ATIME is set
without checking any r/o flags, and IN_ATIME was sometimes converted
into a timestamp before ffs_update killed it, so applications could
see atime changing even on purely r/o mounted file systems).
The fix in ufs_vnops.c was lost relatively recently as part of related
changes to fix IN_ACCESS for non-exclusively-locked reads:
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.279
% retrieving revision 1.280
% diff -u -2 -r1.279 -r1.280
% --- ufs_vnops.c 2 Oct 2006 02:08:31 -0000 1.279
% +++ ufs_vnops.c 10 Oct 2006 09:20:54 -0000 1.280
% @@ -36,5 +36,5 @@
%
% #include <sys/cdefs.h>
% -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.279 2006/10/02 02:08:31 tegge Exp $");
% +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.280 2006/10/10 09:20:54 kib Exp $");
%
% #include "opt_mac.h"
% @@ -129,29 +129,47 @@
% struct inode *ip;
% struct timespec ts;
% + int mnt_locked;
%
% ip = VTOI(vp);
% + mnt_locked = 0;
% + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
% + VI_LOCK(vp);
% + goto out;
% + }
% + MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */
% + mnt_locked = 1;
% + VI_LOCK(vp);
% if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
% - return;
% + goto out_unl;
% +
%
% ip->i_flag |= IN_LAZYMOD;
% - else
% + else if (((vp->v_mount->mnt_kern_flag &
% + (MNTK_SUSPENDED | MNTK_SUSPEND)) == 0) ||
% + (ip->i_flag & (IN_CHANGE | IN_UPDATE)))
% ip->i_flag |= IN_MODIFIED;
% - if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% - vfs_timestamp(&ts);
% - if (ip->i_flag & IN_ACCESS) {
% - DIP_SET(ip, i_atime, ts.tv_sec);
% - DIP_SET(ip, i_atimensec, ts.tv_nsec);
% - }
% - if (ip->i_flag & IN_UPDATE) {
% - DIP_SET(ip, i_mtime, ts.tv_sec);
% - DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% - ip->i_modrev++;
% - }
% - if (ip->i_flag & IN_CHANGE) {
% - DIP_SET(ip, i_ctime, ts.tv_sec);
% - DIP_SET(ip, i_ctimensec, ts.tv_nsec);
% - }
% + else if (ip->i_flag & IN_ACCESS)
% + ip->i_flag |= IN_LAZYACCESS;
% + vfs_timestamp(&ts);
% + if (ip->i_flag & IN_ACCESS) {
% + DIP_SET(ip, i_atime, ts.tv_sec);
% + DIP_SET(ip, i_atimensec, ts.tv_nsec);
% + }
% + if (ip->i_flag & IN_UPDATE) {
% + DIP_SET(ip, i_mtime, ts.tv_sec);
% + DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% + ip->i_modrev++;
% + }
% + if (ip->i_flag & IN_CHANGE) {
% + DIP_SET(ip, i_ctime, ts.tv_sec);
% + DIP_SET(ip, i_ctimensec, ts.tv_nsec);
% }
% +
% + out:
% ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
% + out_unl:
% + VI_UNLOCK(vp);
% + if (mnt_locked)
% + MNT_IUNLOCK(vp->v_mount);
% }
%
Now we trust MNT_RDONLY again, so that when the wrong or incomplete
flags are set earlier and the wrong flags aren't converted to the
modified flag before MNT_RDONLY is set, then we only lose both timestamps
and critical updates here.
I think the best quick fix would be to trust fs_ronly here, except for
IN_ACCESS. Then wrong IN_CHANGE | IN_UPDATE flags would just cause
wrong updates of i_ctime and i_mtime, and an extra i/o to write these
changes, but missing IN_MODIFIED flags would be fixed up as in rev.1.182
and associated correct IN_CHANGE | IN_UPDATE flags wouldn't be incorrectly
discarded. IN_ACCESS needs special handling even in the non-snapshot
cases so that read() doesn't race mount-update (at best, read()s might
keep dirtying inodes, so there would be a problem setting fs_ronly
atomically with completing the sync).
Most other file systems are primitive or broken in this area. ext2fs is
at the level of ufs_vnops.c 1.182. msdosfs is at level before 4.4BSD-Lite
(it still uses its clone of ITIMES() and looks more like the Net/2 ffs
than the 4.4BSD one). But most other file systems are Giant-locked, so
they don't have the IN_ACCESS races.
Bruce
More information about the freebsd-fs
mailing list