GJournal (hopefully) final patches.
Paul Allen
nospam at ugcs.caltech.edu
Fri Aug 11 02:49:22 UTC 2006
It's a bit disturbing that a geom-class quite far away
from the storage drivers presumes that the proper action
here is a cache flush. The underlying hardware may
support tagged command queuing (i.e., SCSIs ability
to receive not only transaction completion notications
but also to permit partial-orderings to be dictated to
the controller) or native-command queuing (command completion).
It's true that this functionality may not always work as
advertised but that's a problem to be solved with dev.
sysctls, not by taking a LCD approach in a high-level geom
class.
This really needs broader architecture consideration,
not just what it takes it make it work.
Paul
>From Pawel Jakub Dawidek <pjd at freebsd.org>, Thu, Aug 10, 2006 at 09:28:41PM +0200:
> On Thu, Aug 10, 2006 at 01:47:23PM -0500, Craig Boston wrote:
> > Hi,
> >
> > It's great to see this project so close to completion! I'm trying it
> > out on a couple machines to see how it goes.
> >
> > A few comments and questions:
> >
> > * It took me a little by surprise that it carves 1G out of the device
> > for the journal. Depending on the size of the device that can be a
> > pretty hefty price to pay (and I didn't see any mention of it in the
> > setup notes). For a couple of my smaller filesystems I reduced it to
> > 512MB. Perhaps some algorithm for auto-sizing the journal based on
> > the size / expected workload of the device would be in order?
>
> It will be pointed out in documentation when I finally prepare it.
> I don't have plans about autosizing currently.
>
> > * Attached is a quick patch for geom_eli to allow it to pass BIO_FLUSH
> > down to its backing device. It seems like the right thing to do and
> > fixes the "BIO_FLUSH not supported" warning on my laptop that uses a
> > geli encrypted disk.
>
> I've this already in my perforce tree. I also implemented BIO_FLUSH
> passing in gmirror and graid3.
>
> I also added a flag for gmirror and graid3 which says "don't
> resynchronize components after a power failure - trust they are
> consistent". And they are always consistent when placed below gjournal.
>
> > * On a different system, however, it complains about it even on a raw
> > ATA slice:
> >
> > atapci1: <Intel ICH4 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0
> > ata0: <ATA channel 0> on atapci1
> > ad0: 114473MB <WDC WD1200JB-00CRA1 17.07W17> at ata0-master UDMA100
> > GEOM_JOURNAL: BIO_FLUSH not supported by ad0s1e.
> >
> > It seems like a reasonably modern controller and disk, at least it
> > should be capable of issuing a cache flush command. Not sure why it
> > doesn't like it :/
>
> We would need to add some printfs to diagnoze this probably - you can
> try adding some lines to ad_init() to get this:
>
> if (atadev->param.support.command1 & ATA_SUPPORT_WRITECACHE) {
> if (ata_wc)
> ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_ENAB_WCACHE, 0, 0);
> else
> ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_DIS_WCACHE, 0, 0);
> } else {
> printf("ad_init: WRITE CACHE not supported by ad%d.\n",
> device_get_unit(dev));
> }
>
> > * How "close" does the filesystem need to be to the gjournal device in
> > order for the UFS hooks to work? Directly on it?
> >
> > The geom stack on my laptop currently looks something like this:
> >
> > [geom_disk] ad0 <- [geom_eli] ad0.eli <- [geom_gpt] ad0.elip6 <-
> > [geom_label] gjtest <- [geom_journal] gjtest.journal <- UFS
> >
> > I was wondering if an arrangement like this would work:
> >
> > [geom_journal] ad0p6.journal <- [geom_eli] ad0p6.journaleli <- UFS
> >
> > and if it would be any more efficient (journal the encrypted data
> > rather than encrypt the journal). Or even gjournal the whole disk at
> > once?
>
> When you mount file system it sends BIO_GETATTR "GJOURNAL::provider"
> requests. So as long as classes between the file system and gjournal
> provider pass BIO_GETATTR down, it will work.
>
> On my home machine I've the following configuration:
>
> raid3/DATA1.elid.journal
>
> So it's UFS over gjournal over bsdlabel over geli over raid3 over ata.
>
> I prefer to put gjournal on the top, because it gives consistency to
> layers below it. For example I can use geli with bigger sector size
> (sector size greater than disk sector size in encryption-only-mode can
> be unreliable on power failures, which is not the case when gjournal is
> above geli), I can turn off synchronization of gmirror/graid3 after a
> power failure, etc.
>
> On the other hand configuring geli on top of gjournal can be more
> effective for large files - geli will not encrypt the data twice.
>
> Fortunatelly with GEOM you can freely mix your puzzles.
>
> > Haven't been brave enough to try gjournal on root yet, but my /usr and
> > /compile (src, obj, ports) partitions are already on it so I'm sure I'll
> > try it soon ;)
>
> Markus Trippelsdorf reported that it doesn't work out of the box, but he
> manage to make it to work with some small changes to fsck_ffs(8).
>
> --
> Pawel Jakub Dawidek http://www.wheel.pl
> pjd at FreeBSD.org http://www.FreeBSD.org
> FreeBSD committer Am I Evil? Yes, I Am!
More information about the freebsd-fs
mailing list