kern/129645: GEOM_JOURNAL causes system to fail to bood due to a
GEOM Timeout problem if the Journals and Data are on storage
provided by separate device drivers.
bmeyer at mesoft.com.au
Sun Dec 14 19:50:02 PST 2008
>Synopsis: GEOM_JOURNAL causes system to fail to bood due to a GEOM Timeout problem if the Journals and Data are on storage provided by separate device drivers.
>Arrival-Date: Mon Dec 15 03:50:01 UTC 2008
>Originator: Brendon Meyer
MicroElectronic SOFTworks Pty Ltd
FreeBSD hercules.sydney.mesoft.com.au 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Sat Dec 13 00:42:49 EST 2008 root at hercules.sydney.mesoft.com.au:/usr/obj/usr/src/sys/GENERIC i386
System will fail to boot correctly after a creating GEOM Journal devices *if* the Journal and Data are split across separate devices.
For example, data provider might be a GPT partition on device 'da0' attached via an Adaptech 39320 HBA and the Journal device might be a GPT partition on a device attached via Adaptec AdvancedRAID Controller (i.e. aac).
The problem is that *if* you specify that the geom_journal module is either compiled into the kernal or loaded as a device driver, you wind up getting the following messages:
kernel: GEOM_JOURNAL: Timeout. Journal gjournal <journal id> cannot be completed.
At this point, when the system hits the 'fsck' script in the RC directory, the provider for the FS in question is unavailable and you get the single user login for remediation.
This has has been able to be demonstrated in a repeatable fashion on a Dell 2550 server using the internal Perc controller holding the system file systems and also handling the disks that will be used for the journals. Disk configured as follows: 2 x SCSI HBA (mirror) for system. 2 x SCSI HBA (mirror) for journal. These devices appear as aacd0 and aacd1 respectively.
Attached externally via a Adaptech 39320 HBA is a Acard RAIDBOX storage using 4 x 1TB Sata drives (the Raidbox device presents a single LUN to the HBA adapter of approx 2.7Tb when configured as RAID5). Appears as 'da0'
Use GPT and slice aacd1 into a number of slices. A single slice is sufficient though. Do the same for the the 'da0' device so that you wind up with say 'aacd1p1' for journals and 'da0p1' for data.
Create a journaled FS as per normal but specify the DATA and JOURNAL providers as 'da0p1' and 'aacd1p1' respectively.
Add the newly created FS into /etc/fstab. Make sure that geom_journal has been added to /boot/loader.conf.
Reboot the system. The system will start booting and you will starting getting the GEOM_JOURNAL timeout messages on the console during the boot. As soon as the system starts checking /etc/fstab for file systems to test, it will fail and drop you into the normal shell.
If you do a gjournal list at this point (the geom_journal module will be loaded) it will not show the journaled device.
A test environment with remote access is available on request to be able to re-create the problem in a reliable fashion.
This isn't a fix but a work around.
The 'work around' is to modify /etc/rc.d/fsck and add the following:
1. Make sure that that the geom_journal module is *not* compiled into the kernal.
2. Make sure that the loader.conf file does *not* load the geom_journal module.
3. Add the following to the top of /etc/rc.d/fsck:
This is bad for a whole range of different reasons. There should be no such requirement in /etc/rc.d/fsck for loading the geom_journal kernel module.
The real solution is to fix the GEOM system so that it only starts the GEOM_JOURNAL component *after* it has successfully attached all of the disks.
More information about the freebsd-bugs