Ext2 vs UFS getlbns

Brian Bergstrand brian at classicalguitar.net
Fri Jun 11 19:34:34 GMT 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jun 11, 2004, at 1:14 PM, Bruce Evans wrote:

> On Fri, 11 Jun 2004, Brian Bergstrand wrote:
>
>> I just noticed something in ext2_getlbns() (ext2_bmap.c, v1.57) vs.
>> ufs_getlbns() (ufs_bmap.c, v1.60)
>>
>> ...
>>
>> Notice that blockcnt is changed AFTER offset/metalbn in Ext2 and 
>> BEFORE
>> those in UFS.
>>
>> Was this change in Ext2 done on purpose for some reason? It makes a
>> difference in calculating in_off and metalbn for some block #'s.
>
> This is to fix overflow in the calculation of block numbers for triple
> indirect blocks.  ffs used to do this, and ext2_getlbns() was a copy
> ufs_getlbns(), but ffs was changed back to use simpler code when its
> daddr type (ufs2_daddr_t) was changed to 64 bits and the longs in
> ufs_getlbns() were fixed to use ufs_daddr_t.  Overflow is probably
> theoretically possible again, but it would take a 128-bit calculation
> to avoid it and 64 bit block numbers should be enough for anyone.
>
> This difference shouldn't affect in_off or metalbn for any reachable
> block number (32 bit ones in ext2fs).  There is another variable
> "int64_t qblockcnt" that is used instead of "long blockcnt" in
> some places in ext2_getlbns().  The logic for using blockcnt in the
> above is a little different because earlier calculations set
> qblockcnt instead of qblockcnt.
>
> Bruce
>

Bruce, thanks for the explanation.

The reason that I originally asked, is because I'm seeing different 
offsets and metalbns for relatively small block #'s in the OS X port. 
The OS X code is derived from FreeBSD 5.x and ext2_getlbns() has not 
changed

For instance, on a 1KB block FS (which therefore has 256 block entries 
per indirect block) I see the following with the original algo.:

Write 21 bytes at offset 0:
lbn = 0
indir not set because this falls in the direct block range

Write 21 bytes at offset 12288:
lbn=12
{{
     in_lbn = -12,
     in_off = 0,
     in_exists = 0
   }, {
     in_lbn = -12,
     in_off = 0,
     in_exists = 0
   }, {
....

Write 21 bytes at offset 4194304
lbn=4096
{{
     in_lbn = -269,
     in_off = 1,
     in_exists = 0
   }, {
     in_lbn = -269,
     in_off = 14,
     in_exists = 0
   }, {
     in_lbn = -3852,
     in_off = 244,
     in_exists = 0
   }, {
...

But, if I move the blockcnt cal to where UFS has it I get:

Write 21 bytes at offset 0: same

Write 21 bytes at offset 12288: same

Write 21 bytes at offset 4194304
lbn=4096
{{
     in_lbn = -269,
     in_off = 1,
     in_exists = 0
   }, {
     in_lbn = -269,
     in_off = 244,
     in_exists = 0
   }, {
     in_lbn = -512,
     in_off = 0,
     in_exists = 0
   }, {
   ...

Notice how indir[1].off, indir[2].off and indir[2].in_lbn are different 
from the first run (with the current ext2 algo). The same thing happens 
with 8MB and 16MB offset writes too.

Any ideas why this happens?

Here's my simplified test case to simulate what happens on a 1KB block 
FS:

#include <stdio.h>
#include <sys/types.h>
#include <sys/errno.h>

struct vnode {

};

struct indir {
	int in_lbn;
	int in_off;
	int in_exists;
};


struct ext2mount {
	int u_mindir;
};
#define MNINDIR(m) (m)->u_mindir;

#define NDADDR 12
#define NIADDR 3

int
ext2_getlbns(vp, bn, ap, nump)
	struct vnode *vp;
	int32_t bn;
	struct indir *ap;
	int *nump;
{
	long blockcnt, metalbn, realbn;
	struct ext2mount *ump;
	int i, numlevels, off;
	int64_t qblockcnt;

	//ump = VFSTOEXT2(vp->v_mount);
	struct ext2mount e2mt = {256};
	ump = &e2mt;
	
	if (nump)
		*nump = 0;
	numlevels = 0;
	realbn = bn;
	if ((long)bn < 0)
		bn = -(long)bn;

	/* The first NDADDR blocks are direct blocks. */
	if (bn < NDADDR)
		return (0);

	/*
	 * Determine the number of levels of indirection.  After this loop
	 * is done, blockcnt indicates the number of data blocks possible
	 * at the previous level of indirection, and NIADDR - i is the number
	 * of levels of indirection needed to locate the requested block.
	 */
	for (blockcnt = 1, i = NIADDR, bn -= NDADDR;; i--, bn -= blockcnt) {
		if (i == 0)
			return (EFBIG);
		/*
		 * Use int64_t's here to avoid overflow for triple indirect
		 * blocks when longs have 32 bits and the block size is more
		 * than 4K.
		 */
		qblockcnt = (int64_t)blockcnt * MNINDIR(ump);
		if (bn < qblockcnt)
			break;
		blockcnt = qblockcnt;
	}

	/* Calculate the address of the first meta-block. */
	if (realbn >= 0)
		metalbn = -(realbn - bn + NIADDR - i);
	else
		metalbn = -(-realbn - bn + NIADDR - i);

	/*
	 * At each iteration, off is the offset into the bap array which is
	 * an array of disk addresses at the current level of indirection.
	 * The logical block number and the offset in that block are stored
	 * into the argument array.
	 */
	ap->in_lbn = metalbn;
	ap->in_off = off = NIADDR - i;
	ap->in_exists = 0;
	ap++;
	for (++numlevels; i <= NIADDR; i++) {
		/* If searching for a meta-data block, quit when found. */
		if (metalbn == realbn)
			break;

		//blockcnt /= MNINDIR(ump);
		off = (bn / blockcnt) % MNINDIR(ump);

		++numlevels;
		ap->in_lbn = metalbn;
		ap->in_off = off;
		ap->in_exists = 0;
		++ap;

		metalbn -= -1 + off * blockcnt;
		blockcnt /= MNINDIR(ump);
	}
	if (nump)
		*nump = numlevels;
	return (0);
}

int main (int argc, char *argvp[])
{
	struct vnode v;
	struct indir indir[NIADDR+2];
	int num;
	
	(void)ext2_getlbns(&v, 0, indir, &num);
	(void)ext2_getlbns(&v, (12288>>10), indir, &num);
	(void)ext2_getlbns(&v, (4194304>>10), indir, &num);
	(void)ext2_getlbns(&v, (24576>>10), indir, &num);
	(void)ext2_getlbns(&v, (8388608>>10), indir, &num);
	(void)ext2_getlbns(&v, (49152>>10), indir, &num);
	(void)ext2_getlbns(&v, (16777216>>10), indir, &num);
}

Thanks.

Brian Bergstrand <http://www.bergstrand.org/brian/>, AIM: triryche206
PGP Key: <http://www.bergstrand.org/brian/misc/public_key.txt>
It is easy to convert a bug to a feature; document it. - The Microsoft 
Employee Handbook, p.215, section B, article VII
As of 02:00:08 PM, iTunes is playing "Tears In Heaven" from "MTV 
Unplugged" by "Eric Clapton"

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0.3

iQA/AwUBQMn7OXnR2Fu2x7aiEQKNoACg2wQGJNSmmZLNe8iyyh7QhaP5ZwAAoPmK
f0DJmDF8IEJ5C4Vke4jxtBwC
=yBtM
-----END PGP SIGNATURE-----



More information about the freebsd-fs mailing list