Debugging pseudo-disk driver on FreeBSD

Sun May 2 15:26:02 PDT 2004

On Sun, May 02, 2004 at 12:41:56AM -0600, Siddharth Aggarwal wrote:
> 
> Hi,
> 
> I am working on a Copy on Write disk driver on FreeBSD where I try to save
> the state of a filesystem (/dev/ad0s3) to another device (/dev/ad0s4) by
> making a virtual device that sits on top of these two (/dev/shd0).
> 
> 1. So in the strategy routine, I get the block read/write calls to
> (/dev/shd0) .
> 2. For a write operation, I copy the previous contents of the block
> (number corresponding to /dev/ad0s3)  on to a free block on /dev/ad0s4
> 3. To restore previous contents of disk, I read the allocated free block
> from /dev/ad0s4 and write it back to original block number /dev/ad0s3.
>
> The virtual device /dev/shd0 is mounted on /mnt
> 
> So to test it out, my /dev/ad0s3 originally had a file "old1" of 13685
> bytes containing repeating string pattern (OLDOLD)
> I then copied a file "new1" of 8211 bytes having the repeating pattern
> (NEWNEW) to overwrite the old one
> i.e. cp new1 /mnt/old1
>
> A hexdump shows that a block of 8192 bytes containing "OLDOLD" was copied
> over to /dev/ad0s4 and its place being taken be "NEWNEW" in /dev/ad0s3.
> Also remaining bytes (beyond the 8192 bytes) still remain in /dev/ad0s3.
> So this shows that the copy on write was done correctly. And I correctly
> see 8211 bytes of "NEWNEW" in /mnt/old1 (ls -l /mnt/old1)

On closer read, I see the advantage of your approach here: were the
originating device always has the latest changes but old data is
still stored on another device. (But for how long..  until next
overwrite.  Revisioning possibilities?)  This means that the original
disk is always consistent with the most recent changes but has a
sort of log of old blocks?

This is the conceptually opposite approach to the union filesystem
which traditionally keeps new changes to files on another filesystem
(the overlay) and preserve the underlying filesystem contents.

Your facility also allows devices containing arbitrary data which
could be for example raw data streams as opposed to a filesystem
which is accessible through the VFS.  But this carries with it the
implications of device-level block-i/o.  Restoring any given file
would involve translating the inode to physical blocks and restoring
only those portions which were changed by the operation.  I'm unclear
how this works.  Take undeleting a file:  Wouldn't you need to
restore the inode, the direct blocks, any indirect blocks and
dirents by referencing these blocks.  How do you know how to do
this (at file granularity) at the device-level in a filesystem
agnostic way?  (Could writes be processed atomically?)

Alternatively, you can implement this copy-on-write scheme at the
vnode layer.

> I then send an IOCTL to my driver to restore to the previous state
> (expecting it to give me 13685 bytes of "OLDOLD" back in /mnt/old1)

So this is like a snapshot of the original state of the filesystem
on the device in it's entirety (sort of like snapshots but at the
device-level vs. file-system)?  How do you ensure it's consistent,
especially when the device backing the storage of old blocks becomes
full, which do you turf first?  (Problem is less significant if you
have a 1:1 mapping of blocks like RAID mirror w/ same partition size.)

> After unmounting and remounting, I see that the contents of /mnt/old1 have
> become OLDOLD, but there are only 8211 bytes instead of 13685. A hexdump of
> /dev/ad0s3 however, shows that there are indeed 13685 consecutive bytes of
> OLDOLD lying there.
>
> This has lead me to believe that the Inode of /mnt/old1 is not being
> refereshed (or it was never saved off to the /dev/ad0s4 in the first place). Do Inode
> read/writes go through the strategy routine in the first place?

Can you reboot the machine and see the same effects?  I know that
sounds like an extreme measure, but that's a way to determine for
sure if it's a caching issue.  You could also try doing a few large
dd's form another filesystem between dis/remount.

> Any idea what could be going wrong?

No clue. ;)

--
 Allan Fields
 AFRSL - http://afields.ca
 BSDCan: May 2004, Ottawa - http://www.bsdcan.org