Re-starting a gjournal provider

Fri Jul 31 06:49:32 UTC 2009

On Fri, Jul 31, 2009 at 12:05:45AM -0600, Anthony Chavez wrote:
> > It doesn't come back because something (ATA layer?) doesn't properly
> > remove ad0 provider. When you remove the disk, /dev/ad0 should disappear
> > and reappear once you insert it again.
> > 
> > You can still do this trick after you insert the disk again so the GEOM
> > can schedule retaste:
> > 
> > 	# true > /dev/ad0
> 
> Thank you for informing me of that trick.  I tried using it after
> "gjournal stop" but unfortunately, nothing changed.

This is because it should be /dev/ad0s1 and not /dev/ad0. Try with
/dev/ad0s1.

> Here are the points to note.
> 
> 1) When I physically remove a drive from the enclosure, /dev/ad0 does
> not disappear.  /dev/ad0 *always* exists until I "atacontrol detach."
> Even when the device is powered off, /dev/ad0 continues to exist.

This might be three things:

1. Your enclosure/controller doesn't report back about disk being
   removed.

2. Your enclosure does report back, but ATA ignores such report.
   This will be a bug in ATA.

3. Your controller doesn't support hot-swap or it supports warm-swap,
   which means you have to detach it by hand before removing it.

> 2) /dev/ad0s1.journal disappears when I "gjournal stop."
> /dev/ad0s1.journal is the device that, AFAIK, will only come back after
> "atacontrol detach ata0; atacontrol attach ata0".

It should also get back after 'true > /dev/ad0s1'. What this command do
is to open provider for writing (it doesn't write anything). In GEOM it
will trigger spoil event and then, once command completes, it will
trigger retaste event. This mean that GEOM will inform gjournal to check
/dev/ad0s1 again and this will allow gjournal to find its metadata and
create /dev/ad0s1.journal once again.

One more test would be in place. If you could try the command below
before removing disk and after inserting different disk:

	# diskinfo -v /dev/ad0

If it shows exactly the same in two cases, it means that it is not aware
that disk was replaced and detach/attach cycle is needed.

> 1) Is "atacontrol detach ata0 && atacontrol attach ata0" in fact a safe
> operation to perform in any circumstance?
> 
> My better judgment has me thinking that the answer to this question is
> almost certainly "no."  However, I am hypothesizing that it would safe
> enough if all devices on ata0 are properly unmounted first, but if I can
> avoid that, I will.  It feels clumsy and seems to defeat the purpose of
> hot-swapping.

It should be safe, but there were plenty of bugs related to disappearing
disk from under mount file system, etc. If nothing is mounted you should
be fine (if there are no ATA bugs in this area).

But for full hot-swap the disk controller should discover disk being
removed and ATA code should remove it from /dev/.

> 2) Is it *necessary* to "gjournal stop" before hot-swapping?
> 
> In such a scenario, I would opt to simply "umount; gjournal sync," swap
> disks, and then "atacontrol cap ad0; mount" (or even just "mount").  It
> seems quite likely, however, that all drives that undergo this treatment
> would be *required* to have gjournal labels since /dev/ad0s1.journal
> would never disappear (although I've yet to actually test that).

I'd go with 'umount; gjournal stop' and drop 'gjournal sync'.

Controler should inform ATA that disk is gone. ATA should inform GEOM
that ad0 is gone. If that would be the case, simple 'umount; gjournal
sync' will be enough. But because it isn't the case, you have to stop
gjournal and detach ad0.

> 3) If the answer to question 2 is "yes," then how can I handle the case
> of inserting a drive that does *not* have a gjournal label?

There's nothing special here. Let's see how diskinfo test will go first.

> 1) Is it really necessary to perform 3 "sync" commands before "umount"?
> 
> Line 94 of src/sbin/umount/umount.c,v 1.45.20.1 has me thinking that the
> answer is "no," since it calls sync() itself, albeit only once.  I got
> the idea for executing "sync" three times from /etc/rc.suspend.

The idea is that unmount should take case of syncing data. There should
be not need for even one sync. It is called "just in case".

> 2) Is it necessary to "gjournal sync" if I'm going to "gjournal stop"
> anyway? (You answered this one already.)

No, stop should be sufficient.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090731/415d0c4d/attachment.pgp