svn commit: r218604 - head/sbin/fsck_ffs
Kostik Belousov
kostikbel at gmail.com
Sun Feb 13 13:14:17 UTC 2011
On Sun, Feb 13, 2011 at 11:30:56PM +1100, Bruce Evans wrote:
> On Sat, 12 Feb 2011, Konstantin Belousov wrote:
>
> >Log:
> > In checker, read journal by sectors.
> >
> > Due to UFS insistence to pretend that device sector size is 512 bytes,
>
> Do you mean that ffs_fsck does this? UFS doesn't exist, and ffs in the
> kernel didn't use device sectors at all until recently (it used DEV_BSIZE
> units, but those are not sectors but are the units for bread() and friends).
Yes, the journal writer started using DEV_BSIZE as _sector_ size to
write the journal blocks, and fixing this was the point of the series
of commits. Journal was unoperable on the providers with non-512 byte
sectors.
> newfs uses a hack to do this. When handling kernel sizes in units of
> DEV_BSIZE (not a sector size!), ffs_fsck should use DEV_BSIZE (or maybe
> just fsbtodb()) instead of `dev_bsize' or `sectorsize' and not
> pretend that DEV_BSIZE (or the corresponding superblock value for the
> conversion macro) is the sector size.
>
> > sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices,
> > and from the label otherwise. The file images without label have to
> > be made with 512 sector size.
>
> >Modified: head/sbin/fsck_ffs/fsck.h
> >==============================================================================
> >--- head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:12:45 2011 (r218603)
> >+++ head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:17:14 2011 (r218604)
> >@@ -268,6 +268,7 @@ char snapname[BUFSIZ]; /* when doing sna
> >char *cdevname; /* name of device being checked */
> >long dev_bsize; /* computed value of DEV_BSIZE */
> >long secsize; /* actual disk sector size */
> >+long real_dev_bsize;
>
> If `secsize' is the actual disk sector size, then we don't need another
> variable giving the real sector size.
>
> The new variable has no comment. What is the difference between a sector
> size and a dev_bsize? There is enough confusion between DEV_BSIZE and
> sector sizes already. DEV_BSIZE has come to means nothing to do with
> sectors and little to do with devices or block sizes. It is just a minimal
> i/o sizes for use in old interfaces (ones that didn't want to use more than
> 32 bits for disk addresses, but now use 64 bits anyway, so that they can
> address 72 bits after multiplication by DEV_BSIZE).
There is no "actual" difference (I tired of the word actual when debugging
the patches), they are all DEV_BSIZE.
>
> These variables are named and documented slightly better in mkfs:
>
> % extern int sectorsize; /* bytes/sector */
> % extern int realsectorsize; /* bytes/sector in hardware*/
>
> At lease the fake sector size variable doesn't claim to be actual, and
> the real sector size variable doesn't have extra underscores and claims
> to be a sector size variable.
>
> >Modified: head/sbin/fsck_ffs/setup.c
> >==============================================================================
> >--- head/sbin/fsck_ffs/setup.c Sat Feb 12 13:12:45 2011 (r218603)
> >+++ head/sbin/fsck_ffs/setup.c Sat Feb 12 13:17:14 2011 (r218604)
> >@@ -446,7 +446,7 @@ sblock_init(void)
> > if (sblk.b_un.b_buf == NULL || asblk.b_un.b_buf == NULL)
> > errx(EEXIT, "cannot allocate space for superblock");
> > if ((lp = getdisklabel(NULL, fsreadfd)))
> >- dev_bsize = secsize = lp->d_secsize;
> >+ real_dev_bsize = dev_bsize = secsize = lp->d_secsize;
> > else
> > dev_bsize = secsize = DEV_BSIZE;
> >}
>
> Both the variables are real enough here, provided getdisklabel() works.
> Otherwise, you are in trouble and should fail instead of defaulting the
> size unless the size is set by another means. newfs seems to fail, and
> I'm sure newfs_msdos does fail if the sector size is unknown.
>
> >Modified: head/sbin/fsck_ffs/suj.c
> >==============================================================================
> >--- head/sbin/fsck_ffs/suj.c Sat Feb 12 13:12:45 2011 (r218603)
> >+++ head/sbin/fsck_ffs/suj.c Sat Feb 12 13:17:14 2011 (r218604)
> >@@ -28,6 +28,7 @@
> >__FBSDID("$FreeBSD$");
> >
> >#include <sys/param.h>
> >+#include <sys/disk.h>
> >#include <sys/disklabel.h>
> >#include <sys/mount.h>
> >#include <sys/stat.h>
> >@@ -201,6 +202,11 @@ opendisk(const char *devnam)
> > disk->d_error);
> > }
> > fs = &disk->d_fs;
> >+ if (real_dev_bsize == 0 && ioctl(disk->d_fd, DIOCGSECTORSIZE,
> >+ &real_dev_bsize) == -1)
> >+ real_dev_bsize = secsize;
> >+ if (debug)
> >+ printf("dev_bsize %ld\n", real_dev_bsize);
> >}
> >
> >/*
>
> If this gives a different "real_dev_bsize", and this is different from the
> sector size in the label or defaulted-to, then the latter is presumably
> wrong, and we're left with a `secsize' that is not the "actual sector size".
Yes, secsize is not the "actual sector size", it is equal to DEV_BSIZE,
see setup.c:sblock_init().
>
> It is a bug to replace the sector size in the label with the one obtained
> here. The former should be correct, and it is the only one under user
> control. It might need to be changed if one given by the ioctl is wrong
> or just if it is fake. (Labels only have 32-bit disk addresses, so fake
> sector sizes are needed to address more than 41 bits if the physical sector
> size has is 512.)
>
> newfs doesn't have the precedence that I want:
> - sectorsize set on the command line (-S option) has precedence. [Bug;
> -S 0 is needed to cancel any previous setting of sectorsize on the
> command line by -S or maybe by -T, but it is treated as an error.]
> fsck_ffs unfortunately doesn't seem to have any -S or -T option. These
> used to be unneeded since sector size weren't used, and labels worked
> and contain enough info to locate the superblock where there is more
> info.
> - then try setting sectorsize using the ioctl if sectorsize is not already
> set. [Bug: no error checking for this ioctl. If it fails, then it
> may clobber sectorsize, which breaks the next step.]
> - then set sectorsize from the label if there is a label and sectorsize
> is not already set
> - fail if sectorsize is not set
newfs.c does not do what you describe, in fact. After going to all
troubles finding the sector size, it performs the following:
if (sectorsize != DEV_BSIZE) { /* XXX */
int secperblk = sectorsize / DEV_BSIZE;
sectorsize = DEV_BSIZE;
fssize *= secperblk;
if (pp != NULL)
pp->p_size *= secperblk;
}
And now layout calculations happen as if sector size is 512 bytes. I was
puzzled for long time why libufs givess me 512 in d_bsize for a volume over
4096-bytes per sector md:
disk->d_bsize = fs->fs_fsize / fsbtodb(fs, 1);
>
> Then the hacks involving realsectorsize:
> - set it to sectorsize
> - if sectorsize != DEV_BSIZE, change it to DEV_BSIZE and fix up related
> quantities (I hope the change to the partition table is never written).
> Now `sectorsize' is no longer "actual", though it was actual before the
> hacks. We continue with the actual sector size recorded in
> realsectorsize.
realsectorsize is not used. You are citing the piece of code I pasted above.
>
> >@@ -2262,7 +2268,7 @@ suj_build(void)
> > rec = (union jrec *)seg->ss_blk;
> > for (i = 0; i < seg->ss_rec.jsr_cnt; off += JREC_SIZE,
> > rec++) {
> > /* skip the segrec. */
> >- if ((off % DEV_BSIZE) == 0)
> >+ if ((off % real_dev_bsize) == 0)
> > continue;
> > switch (rec->rec_jrefrec.jr_op) {
> > case JOP_ADDREF:
> >...
>
> This is all sort of backwards. DEV_BSIZE is the real DEV_BSIZE. Old code
> that uses units of DEV_BSIZE spells this as dev_bsize although it is
> constant. New code that uses units of sectors spelled this as DEV_BSIZE
> although the correct size is variable; now the new code spelled this as
> real_dev_bsize, which is variable but unrelated to dev_bsize = DEV_BSIZE
> :-).
We need to write the journal block atomically, and device/geom level only
provides that:
- we must write at least sector;
- any write bigger then sector is not atomic.
>
> I think using fragments instead of sectors would work better. For a start,
> you can get the fragment size from the superblock and not have to worry
> about labels of ioctls.
No, it would not, since fragment write can be non-atomic.
>
> There is a problem locating the superblock. The kernel and various
> utilities only try reading the superblock with size SBSIZE (8K). This
> fails if the sector size is not a divisor of 8K. They also only try
> reading at various offsets which might not be a multiple of the sector
> size, but which are larger than 8K so they are more likely to be a
> multiple. I fixed this problem in fsck_msdosfs where it is larger
> (fsck_msdosfs assumes that the sector size is a divisor of 512, but
> sector sizes like 2K and 4K are now common relative to sizes of 16K).
>
> Hmm, the initialization of dev_bsize is even more obfuscated than I
> remembered:
>
> % /*
> % * Compute block size that the file system is based on,
> % * according to fsbtodb, and adjust superblock block number
> % * so we can tell if this is an alternate later.
> % */
> % super *= dev_bsize;
>
> We may have needed a fake sector size to read the superblock. I forget if
> we got a non-fake one from the label or ioctl before here. We should have
> just tried all reasonable power of 2 sizes between 1 and 1M (I use
> 64K down to DOSBOOTBLOCKSIZE = 512 in fsck_msdosfs).
>
> % dev_bsize = sblock.fs_fsize / fsbtodb(&sblock, 1);
>
> This is an obfuscated way of spelling DEV_BSIZE. Although fsbtodb()
> converts to blocks with fixed size DEV_BSIZE, it needs a variable shift
> count to handle the variable fragment size. This shift count is
> fs_fsbtodb, which was initialized in newfs to
> ilog2(fs_fsize / sectorsize). But sectorsize was fake there
> (always DEV_BSIZE). Suppose for example than fs_fsize is the default
> (2K). Then fs_fsbtodb = ilog2(4) = 2, so fsbtodb(&sbblock, 1) is 4
> and dev_bsize = 2K / 4 = DEV_BSIZE again. It is always DEV_BSIZE again,
> since the fake sectorsize in newfs is always DEV_BSIZE. The fakery is
> extended to fsck_ffs by the above assignment. The above assignment is
> necessary if the kernel is changed to not fake things so much, or to
> fake them differently (newfs can't can change it's i/o methods but can't
> change the generated fs_fsbtodb etc. unless the kernel changes. All it
> could do much better is to make the conversions between different block
> sizes (real and virtual either way) more explicit). However, things
> are already confusing enough with a fixed DEV_BSIZE.
Again, you are describing what I pasted above, and what I referred to
as "UFS (ok, FFS) pretending that sector size is 512 bytes".
I agree that it would be cleaner if FFS does not make such obscuring
arithmetic, but it did not matter while i/o was done in units of
whole fragments. Journal requirements are different.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/svn-src-all/attachments/20110213/bef8bae8/attachment.pgp
More information about the svn-src-all
mailing list