Re: Replacing a REMOVED drive in DEGRADED zpool

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Fri, 22 Aug 2025 05:11:30 UTC
On 8/21/25 21:02, Robert wrote:
> On 8/21/2025 3:03 PM, Dag-Erling Smørgrav wrote:
>> You should take a look in /var/backups, you may find a backup of
>> the partition table from the failed drive.  Assuming you remove the
>> failed drive first, you can safely `gpart restore -l` this backup onto
>> the replacement drive, which will recreate the labels (but not UUIDs).
> 
> Great, had no idea, yes, I see the gpartada0.backup in /var/backups...
> 
> root@db1:~ # cat /var/backups/gpart.ada0.bak <<-- REMOVED disk
> GPT 128
> 1   freebsd-boot        40      1024 gptboot0
> 2   freebsd-swap      2048  16777216 swap0
> 3    freebsd-zfs  16779264 276267008 zfs0
> root@db1:~ # cat /var/backups/gpart.ada1.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot1
> 2   freebsd-swap      2048  16777216 swap1
> 3    freebsd-zfs  16779264 276267008 zfs1
> root@db1:~ # cat /var/backups/gpart.ada2.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot2
> 2   freebsd-swap      2048  16777216 swap2
> 3    freebsd-zfs  1677926l /v4 276267008 zfs2
> root@db1:~ # cat /var/backups/gpart.ada3.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot3
> 2   freebsd-swap      2048  16777216 swap3
> 3    freebsd-zfs  16779264 276267008 zfs3
> root@db1:~ # cat /var/backups/gpart.ada4.bak
> 


Good.  So long as nothing uses GUID/UUID, gpart(8) restore with labels 
should work.


This is my server system disk (BIOS, MBR):

2025-08-21 21:13:19 toor@f5 ~
# gpart show ada0
=>       40  117231328  ada0  GPT  (56G)
          40       1024     1  freebsd-boot  (512K)
        1064   29359104     2  freebsd-ufs  (14G)
    29360168    1564672     3  freebsd-swap  (764M)
    30924840   86306528        - free -  (41G)


I have a backup of the freebsd boot partition:

2025-08-21 21:55:05 toor@f5 ~
# ll /var/backups/boot.ada0p1.bak
-rw-r--r--  1 root  wheel  524288 2024/03/04 03:01:00 
/var/backups/boot.ada0p1.bak


And the backup still matches adap1:

2025-08-21 21:13:44 toor@f5 ~
# cmp /dev/ada0p1 /var/backups/boot.ada0p1.bak

2025-08-21 21:14:00 toor@f5 ~
# echo $?
0


The last piece of the puzzle is the MBR.  I see some possibilities in /boot:

2025-08-21 21:20:36 toor@f5 ~
# ll -S /boot | grep ' 512 ' | grep -v drwx
-r--r--r--   1 root  wheel     512 2025/05/24 14:51:34 boot0
-r--r--r--   1 root  wheel     512 2025/05/24 14:51:34 boot0sio
-r--r--r--   1 root  wheel     512 2023/04/06 21:24:38 boot1
-r--r--r--   1 root  wheel     512 2023/04/06 21:24:38 mbr
-r--r--r--   1 root  wheel     512 2023/04/06 21:24:38 pmbr


Referring to WikiPedia "Master boot record" table "Structure of a 
classical generic MBR":

https://en.wikipedia.org/wiki/Master_boot_record


The bootstrap code area is the first 446 bytes.  Look for a match:

2025-08-21 21:24:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0
/dev/ada0 /boot/boot0 differ: char 12, line 1

2025-08-21 21:25:00 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0sio
/dev/ada0 /boot/boot0sio differ: char 12, line 1

2025-08-21 21:25:05 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot1
/dev/ada0 /boot/boot1 differ: char 1, line 1

2025-08-21 21:25:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/mbr
/dev/ada0 /boot/mbr differ: char 12, line 1

2025-08-21 21:25:12 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/pmbr


So, the FreeBSD installer put /boot/pmbr into the MBR of my system disk.


Checking the partition table entries and boot signature:

2025-08-21 21:28:19 toor@f5 ~
# cmp -i 446 -n 16 /dev/ada0 /boot/pmbr
/dev/ada0 /boot/pmbr differ: char 3, line 1

2025-08-21 21:28:50 toor@f5 ~
# cmp -i 462 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:28:58 toor@f5 ~
# cmp -i 478 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:09 toor@f5 ~
# cmp -i 494 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:17 toor@f5 ~
# cmp -i 510 -n 2 /dev/ada0 /boot/pmbr


So, everything matches except partition entry number 1:

2025-08-21 21:31:33 toor@f5 ~
# dd if=/dev/ada0 count=1 status=none | hexdump -s 446 -n 16
000001be  00 00 02 00 ee ff ff ff  01 00 00 00 2f cf fc 06 
|............/...|
000001ce

2025-08-21 21:32:27 toor@f5 ~
# dd if=/boot/pmbr count=1 status=none | hexdump -s 446 -n 16
000001be  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
000001ce


So, the installer must have populated the first partition entry.


Referring to the WikiPedia page table "Layout of one 16-byte partition 
entry", decoding my MBR first partition entry:

Status or physical drive
inactive

CHS address of first absolute sector in partition
cylinder = 0
head = 0
sector = 2

Partition type
ee = GPT protective MBR

CHS adress of last absolute sector in partition
cylinder = 1023
head = 255
sector = 31

LBA of first absolute sector in the partition
0x00000001 = sector 1

Number of sectors in partition
0x06fccf2f = 117231407 sectors


Convert the number of sectors in partition field value to decimal:

2025-08-21 21:32:37 toor@f5 ~
# perl -e 'printf "%i\n", 0x06fccf2f'
117231407


This matches the disk size minus 1 (for the MBR):

2025-08-21 21:54:57 toor@f5 ~
# diskinfo -v ada0 | grep 'mediasize in sectors'
	117231408   	# mediasize in sectors


Again, I would check if the failed disk and the other disck all have the 
same MBR.  If so, you could clone one of them into the MBR of 
replacement disk.


>>> Would recovering the disk be beneficial versus replace? As far as
>>> faster recovery, not needing to resilver or as much. These are not big
>>> drives as you can see and RAID10 zpool.
>> You can try to use recoverdisk to copy undamaged portions of the failed
>> drive onto the replacement, but it's likely to take longer than
>> resilvering.
> Then I'll stick to the original plan but with attach instead of replace 
> using `zpool attach ada0p3 ada0p3`.
> 

I think you have a typo -- the replacement ada0p3 should attach to ada1p3.


David