Re: read and write back full disk to trigger relocation

Reply: Sysadmin Lists : "Re: read and write back full disk to trigger relocation"
In reply to: Sysadmin Lists : "Re: read and write back full disk to trigger relocation"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Tue, 30 May 2023 23:16:09 UTC
On 5/30/23 02:18, Sysadmin Lists wrote:
> David Christensen May 29, 2023, 4:12:24 PM
>> Testing dd(1) and gmirror(8):
>>
>> 2023-05-29 15:21:32 toor@vf1 ~
>> # freebsd-version ; uname -a
>> 12.4-RELEASE-p2
>> FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD
>> 12.4-RELEASE-p1 GENERIC  amd64
>>
>> 2023-05-29 15:23:05 toor@vf1 ~
>> # gmirror label mymirror ada3 ada4
>>
>> 2023-05-29 15:24:11 toor@vf1 ~
>> # gmirror status mymirror
>>              Name    Status  Components
>> mirror/mymirror  COMPLETE  ada3 (ACTIVE)
>>                              ada4 (ACTIVE)
>>
>> 2023-05-29 15:52:41 toor@vf1 ~
>> # dd if=/dev/ada3 of=/dev/ada3 bs=1m
>> dd: /dev/ada3: Operation not permitted
>>
>> 2023-05-29 15:53:45 toor@vf1 ~
>> # dd if=/dev/ada4 of=/dev/ada4 bs=1m
>> dd: /dev/ada4: Operation not permitted
>>
>> 2023-05-29 15:53:52 toor@vf1 ~
>> # dd if=/dev/mirror/mymirror of=/dev/mirror/mymirror bs=1m
>> 1023+1 records in
>> 1023+1 records out
>> 1073741312 bytes transferred in 3.299006 secs (325474224 bytes/sec)
>>
>>
>> This confirms that the kernel will not allow writes to mirror components
>> when they are active, as it should.  If a process could write to a
>> component of a mirror, that would bypass the mirror driver, defeat the
>> purpose of the mirror, allow race conditions, and result in data loss/
>> data corruption.
> 
> That makes sense. I wouldn't recommend running it on a live system anyway.
> Probably wiser to boot into a livecd and run it on a single disk. gmirror
> shouldn't notice a difference since the data isn't presently corrupted, just
> decaying (is my guess). 3TB is a lot of data to process.


I also prefer to do disk maintenance activities when the disks are 
off-line, typically by booting alternate media (such as a live USB stick).


I did the above testing on VirtualBox on Debian by creating two 1 GB 
virtual disks backed by files.  When I created the virtual disks, I 
choose "Dynamic" sizing -- e.g. the backing files start small and grow 
as data is added.


I have since noted that the size, mtime, and atime on the backing files 
have not changed since the files were created:

2023-05-30 15:56:36 dpchrist@taz ~/virtualbox/virtual-machines/vf1
$ stat vf1_?.vdi
   File: vf1_3.vdi
   Size: 2097152   	Blocks: 16         IO Block: 4096   regular file
Device: fe02h/65026d	Inode: 392462      Links: 1
Access: (0600/-rw-------)  Uid: (13250/dpchrist)   Gid: (13250/dpchrist)
Access: 2023-05-29 15:20:35.292781334 -0700
Modify: 2023-05-29 15:19:51.553228088 -0700
Change: 2023-05-29 15:19:51.553228088 -0700
  Birth: 2023-05-29 15:13:28.182411743 -0700
   File: vf1_4.vdi
   Size: 2097152   	Blocks: 16         IO Block: 4096   regular file
Device: fe02h/65026d	Inode: 392466      Links: 1
Access: (0600/-rw-------)  Uid: (13250/dpchrist)   Gid: (13250/dpchrist)
Access: 2023-05-29 15:20:35.292781334 -0700
Modify: 2023-05-29 15:19:51.553228088 -0700
Change: 2023-05-29 15:19:51.553228088 -0700
  Birth: 2023-05-29 15:13:44.630780217 -0700


If I do the dd(1) command again with O_DIRECT:

2023-05-30 15:59:06 toor@vf1 ~
# dd if=/dev/mirror/mymirror of=/dev/mirror/mymirror bs=1m oflag=direct
1023+1 records in
1023+1 records out
1073741312 bytes transferred in 3.465168 secs (309867017 bytes/sec)


The size, mtime, and atime still do not change:

2023-05-30 15:59:55 dpchrist@taz ~/virtualbox/virtual-machines/vf1
$ stat vf1_?.vdi
   File: vf1_3.vdi
   Size: 2097152   	Blocks: 16         IO Block: 4096   regular file
Device: fe02h/65026d	Inode: 392462      Links: 1
Access: (0600/-rw-------)  Uid: (13250/dpchrist)   Gid: (13250/dpchrist)
Access: 2023-05-29 15:20:35.292781334 -0700
Modify: 2023-05-29 15:19:51.553228088 -0700
Change: 2023-05-29 15:19:51.553228088 -0700
  Birth: 2023-05-29 15:13:28.182411743 -0700
   File: vf1_4.vdi
   Size: 2097152   	Blocks: 16         IO Block: 4096   regular file
Device: fe02h/65026d	Inode: 392466      Links: 1
Access: (0600/-rw-------)  Uid: (13250/dpchrist)   Gid: (13250/dpchrist)
Access: 2023-05-29 15:20:35.292781334 -0700
Modify: 2023-05-29 15:19:51.553228088 -0700
Change: 2023-05-29 15:19:51.553228088 -0700
  Birth: 2023-05-29 15:13:44.630780217 -0700


So, either FreeBSD Or Virtual is optimizing away the write(2) calls 
because the write buffer matches what is already in a memory cache from 
prior read(2) calls (?).


I would say the experiment should be repeated on real HDD's, but how do 
I detect if identical data has being written to the platters?  The HDD 
controller also has a cache and could optimize away such writes.


One idea would be to read into a buffer, invert the bits in the buffer, 
write the buffer, invert the bits again, and write again.


These are the kinds of issues that the disk manufacturer is supposed to 
solve.  Thus, my first response "I would look for a manufacturer 
diagnostic tool".


David