misc/147667: Booting with one component of a gmirror, then with the other leads to an inconsistent gmirror device.

Sven Kirmess sven.kirmess at kzone.ch
Mon Jun 7 20:50:02 UTC 2010


>Number:         147667
>Category:       misc
>Synopsis:       Booting with one component of a gmirror, then with the other leads to an inconsistent gmirror device.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 07 20:50:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Sven Kirmess
>Release:        7.3 and 8.0 on i386 release from DVD
>Organization:
>Environment:
FreeBSD free1.kzone.ch 7.3-RELEASE-p1 FreeBSD 7.3-RELEASE-p1 #0: Wed May 26 04:29:05 UTC 2010     root at i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386

>Description:
Booting with one component of a gmirror, then with the other leads to an inconsistent gmirror device. The kernel does not detect that it should have to do a resync nor does it fail one of the two devices.

See "How to repeat the problem" for a detailed description.
>How-To-Repeat:
- Create a gmirror (gm0 on ad1 and ad2).

- Shut down the system and remove ad2.
- Boot the system with ad1.
- Create a file in /: $ touch /ad1.txt
- Shut down the system.

- Remove ad1 and add ad2 back into the system.
- Boot the system with ad2
- Create a file in /: $ touch /ad2.txt
- Shut down the system.

- Add ad1 back into the system.
- Boot the system with ad1 and ad2.

The kernel happily mounts the mirror and doesn't do a resync. You'll get output like this:

$ gmirror status
      Name    Status  Components
mirror/gm0  COMPLETE  ad1
                      ad2
$

If you do an ls / you'll only see one of the two files (that's expected).

Now if you shut down the system again and boot with only ad1, you'll see /ad1.txt and if you boot with only ad2 you'll see /ad2.txt. That's not expected.

That means the mirror is in an inconsistet state and the kernel didn't detect that.

This is what I would expect:
- Whenever gmirror adds a disk to a mirror, it writes the time down on the disk.
- When the driver starts a mirror, it checks that both disks have the exact same time. If that's not the case, the active disk (the one used to boot up to this state) is used to start the mirror. The other is marked as failed (or something) and the error is logged. The administrator is forced to remove and re-add the other disk. We should not resync automatically as this will lead to data loss on the disk we sync to. (ad1 might temporarily fail and refuse to boot, during the next boot ad1 might work again and be the first boot disk the BIOS picks. Now the system would boot from ad1 and sync back to ad2, overwriting everything changed on ad2 since ad1 failed.)
>Fix:
none

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list