gmirror bugs, how many?

João Carlos Mendes Luís jonny at jonny.eng.br
Wed Nov 24 13:30:12 PST 2004


Hi,

     I am blindly testing gmirror, just for fun.  I got an old 8G drive 
and did some tests.  Maybe I did find a bug in gmirror.  This is a long 
message, but please read it to the end if you are a gmirror or GEOM hacker.

     First, I partioned (fdisk) for a full FreeBSD system, with 
sysinstall, which got me this:

******* Working on device /dev/ad1 *******
parameters extracted from in-core disklabel are:
cylinders=16368 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=16368 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
     start 63, size 16498881 (8056 Meg), flag 80 (active)
    beg: cyl 0/ head 1/ sector 1;
    end: cyl 1023/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

     Then I tried to compose a single disk gmirror with the whole ad1 disk:

sigesc::root jcmendes [531] gmirror list
sigesc::root jcmendes [532] gmirror label -b load -v vol0 ad1
Metadata value stored on ad1.
Done.
sigesc::root jcmendes [533] gmirror list
Geom name: vol0
State: COMPLETE
Components: 1
Balance: load
Slice: 4096
Flags: NONE
SyncID: 1
ID: 1397575407
Providers:
1. Name: mirror/vol0
    Mediasize: 8447458816 (7.9G)
    Sectorsize: 512
    Mode: r0w0e0
Consumers:
1. Name: ad1
    Mediasize: 8447459328 (7.9G)
    Sectorsize: 512
    Mode: r0w0e0
    State: ACTIVE
    Priority: 0
    Flags: NONE
    SyncID: 1
    ID: 3966559351

Geom name: vol0.sync

sigesc::root jcmendes [534] ls -l /dev/mirror/
total 1
dr-xr-xr-x  2 root  wheel          512 Nov 24 18:45 .
dr-xr-xr-x  5 root  wheel          512 Nov 24 18:45 ..
crw-r-----  1 root  operator    4,  50 Nov 24 18:45 vol0
crw-r-----  1 root  operator    4,  51 Nov 24 18:45 vol0s1
crw-r-----  1 root  operator    4,  52 Nov 24 18:45 vol0s1a
crw-r-----  1 root  operator    4,  53 Nov 24 18:45 vol0s1b
crw-r-----  1 root  operator    4,  54 Nov 24 18:45 vol0s1c
crw-r-----  1 root  operator    4,  55 Nov 24 18:45 vol0s1d
sigesc::root jcmendes [535] fdisk /dev/mirror/vol0
******* Working on device /dev/mirror/vol0 *******
parameters extracted from in-core disklabel are:
cylinders=1027 heads=255 sectors/track=63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=1027 heads=255 sectors/track=63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
     start 63, size 16498881 (8056 Meg), flag 80 (active)
    beg: cyl 0/ head 1/ sector 1;
    end: cyl 1023/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
sigesc::root jcmendes [536]

Aparently, everything is fine until here.  But now:

sigesc::root jcmendes [536] disklabel /dev/mirror/vol0s1
# /dev/mirror/vol0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
   a:  1048576       63    4.2BSD     2048 16384     8
   b:  1048576  1048639      swap
   c: 16498881       63    unused        0     0         # "raw" part, 
don't edit
   d: 14401729  2097215    4.2BSD     2048 16384 28552
partition c: partition extends past end of unit
disklabel: partition c doesn't start at 0!
disklabel: An incorrect partition c may cause problems for standard 
system utilities
partition d: partition extends past end of unit
sigesc::root jcmendes [537]

     Obviously, this must not be correct.

     I try to check the base disk, but:

sigesc::root jcmendes [542] disklabel /dev/ad1s1
disklabel: /dev/ad1s1: No such file or directory
sigesc::root jcmendes [543] ls -l /dev/ad1*
crw-r-----  1 root  operator    4,  16 Nov 24 18:58 /dev/ad1
sigesc::root jcmendes [544]

     Hey, where are the base partition slices?

     Now, lets reboot.  I could not unload geom_mirror, since it was 
preloaded during boot, is this expected?  The device could not be 
unloaded, but the volume disapeared (gmirror list, ls /dev/mirror). 
This is surely not good. Thats why I did reboot.  Bug #1.

     After the reboot, the device is back (gmirror list).  And, 
surprise, the disklabel is magically corrected:

sigesc::root jcmendes [504] disklabel mirror/vol0s1
# /dev/mirror/vol0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
   a:  1048576        0    4.2BSD     2048 16384     8
   b:  1048576  1048576      swap
   c: 16498881        0    unused        0     0         # "raw" part, 
don't edit
   d: 14401729  2097152    4.2BSD     2048 16384 28552
sigesc::root jcmendes [505]


     Ok, now let's try something diferent.  Let's suppose that I only 
want one slice mirrored.  Maybe the other slices could be standalone, or 
striped, this is not important now.  Let's just say I do want to mirror 
ad1s1, instead of the whole ad1.

sigesc::root jcmendes [506] gmirror remove vol0 ad1
sigesc::root jcmendes [507] gmirror label -b load -v vol0 ad1s1
Metadata value stored on ad1s1.
Done.
sigesc::root jcmendes [508] gmirror list
Geom name: vol0
State: COMPLETE
Components: 1
Balance: load
Slice: 4096
Flags: NONE
SyncID: 1
ID: 3056186377
Providers:
1. Name: mirror/vol0
    Mediasize: 8447426560 (7.9G)
    Sectorsize: 512
    Mode: r0w0e0
Consumers:
1. Name: ad1
    Mediasize: 8447459328 (7.9G)
    Sectorsize: 512
    Mode: r0w0e0
    State: ACTIVE
    Priority: 0
    Flags: NONE
    SyncID: 1
    ID: 4157180820

Geom name: vol0.sync

sigesc::root jcmendes [509]

     Note that the volume size now is different: 8447426560, instead of
8447458816, for the previous config.  This means 32256 bytes, or 63 
sectors.  It's apparently ok.

     But the consumer name is still ad1, and not ad1s1.  Hey, let's check:

sigesc::root jcmendes [510] dd count=1 if=/dev/ad1 of=/tmp/1
1+0 records in
1+0 records out
512 bytes transferred in 0.038226 secs (13394 bytes/sec)
sigesc::root jcmendes [511] dd count=1 if=/dev/mirror/vol0 of=/tmp/2
1+0 records in
1+0 records out
512 bytes transferred in 0.000713 secs (717982 bytes/sec)
sigesc::root jcmendes [512] cmp /tmp/1 /tmp/2
sigesc::root jcmendes [513] dd count=1 if=/dev/ad1 skip=63 of=/tmp/1
1+0 records in
1+0 records out
512 bytes transferred in 0.000655 secs (781471 bytes/sec)
sigesc::root jcmendes [514] cmp /tmp/1 /tmp/2
/tmp/1 /tmp/2 differ: char 1, line 1
sigesc::root jcmendes [515]

     Oops.  It seens that gmirror got the right size and the wrong 
offset.  And I did not need to do all this.  I could simply use ls:

sigesc::root jcmendes [516] ls -l /dev/mirror/
total 1
dr-xr-xr-x  2 root  wheel          512 Nov 24 19:06 .
dr-xr-xr-x  5 root  wheel          512 Nov 24 19:06 ..
crw-r-----  1 root  operator    4,  33 Nov 24 19:06 vol0
crw-r-----  1 root  operator    4,  34 Nov 24 19:06 vol0s1
crw-r-----  1 root  operator    4,  35 Nov 24 19:06 vol0s1a
crw-r-----  1 root  operator    4,  36 Nov 24 19:06 vol0s1b
crw-r-----  1 root  operator    4,  37 Nov 24 19:06 vol0s1c
crw-r-----  1 root  operator    4,  38 Nov 24 19:06 vol0s1d
sigesc::root jcmendes [517]

     If gmirror was only mirroring the ad1s1 slice, it should not see 
new slices inside.  I would expect to find vol0 and vol0[abcd] only...

     Disklabel is still crazy, and fdisk detects the slices it should'nt:

sigesc::root jcmendes [518] disklabel /dev/mirror/vol0s1
# /dev/mirror/vol0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
   a:  1048576       63    4.2BSD     2048 16384     8
   b:  1048576  1048639      swap
   c: 16498881       63    unused        0     0         # "raw" part, 
don't edit
   d: 14401729  2097215    4.2BSD     2048 16384 28552
partition c: partition extends past end of unit
disklabel: partition c doesn't start at 0!
disklabel: An incorrect partition c may cause problems for standard 
system utilities
partition d: partition extends past end of unit
sigesc::root jcmendes [519] fdisk /dev/mirror/vol0
******* Working on device /dev/mirror/vol0 *******
parameters extracted from in-core disklabel are:
cylinders=1027 heads=255 sectors/track=63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=1027 heads=255 sectors/track=63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
     start 63, size 16498881 (8056 Meg), flag 80 (active)
         beg: cyl 0/ head 1/ sector 1;
         end: cyl 1023/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
sigesc::root jcmendes [520]

     Now let's reboot again.

sigesc::root jcmendes [503] disklabel /dev/mirror/vol0s1a
# /dev/mirror/vol0s1a:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
   a:  1048576       63    4.2BSD     2048 16384     8
   b:  1048576  1048639      swap
   c: 16498881       63    unused        0     0         # "raw" part, 
don't edit
   d: 14401729  2097215    4.2BSD     2048 16384 28552
partition a: partition extends past end of unit
partition b: offset past end of unit
partition b: partition extends past end of unit
partition c: partition extends past end of unit
disklabel: partition c doesn't start at 0!
disklabel: partition c doesn't cover the whole unit!
disklabel: An incorrect partition c may cause problems for standard 
system utilities
partition d: offset past end of unit
partition d: partition extends past end of unit
sigesc::root jcmendes [504]

     This time, the disklabel did not return to its "good" state.  And 
the offset bug is repeatable:

sigesc::root jcmendes [507] dd count=1 if=/dev/ad1 of=/tmp/1
1+0 records in
1+0 records out
512 bytes transferred in 0.000647 secs (791553 bytes/sec)
sigesc::root jcmendes [508] dd count=1 if=/dev/mirror/vol0 of=/tmp/2
1+0 records in
1+0 records out
512 bytes transferred in 0.000777 secs (658939 bytes/sec)
sigesc::root jcmendes [509] cmp /tmp/1 /tmp/2
sigesc::root jcmendes [510]

     At least, the behaviour of the slice detection on main disk ad1 
seems to be ok.  The slices reappear if I remove the mirror partition.

sigesc::root jcmendes [513] ls -l /dev/ad1*
crw-r-----  1 root  operator    4,  16 Nov 24 19:20 /dev/ad1
sigesc::root jcmendes [514] gmirror remove -v vol0 ad1
Done.
sigesc::root jcmendes [515] gmirror list
sigesc::root jcmendes [516] ls -l /dev/ad1*
crw-r-----  1 root  operator    4,  16 Nov 24 19:20 /dev/ad1
crw-r-----  1 root  operator    4,  24 Nov 24 19:20 /dev/ad1s1
crw-r-----  1 root  operator    4,  25 Nov 24 19:20 /dev/ad1s1a
crw-r-----  1 root  operator    4,  26 Nov 24 19:20 /dev/ad1s1b
crw-r-----  1 root  operator    4,  27 Nov 24 19:20 /dev/ad1s1c
crw-r-----  1 root  operator    4,  28 Nov 24 19:20 /dev/ad1s1d
sigesc::root jcmendes [517]

     Now the big question: Which is the expected behaviour of mirroring 
a slice?  Whichever answer you give me, I'm sure the current behaviour 
is right.  So, this must be a bug.  Bug #2.

     Is there any gmirror hacker around to fix these?


More information about the freebsd-hackers mailing list