Fwd: filesystem checksum problems on AWS EC2 instances

Kirk McKusick mckusick at mckusick.com
Tue Sep 22 00:01:14 UTC 2020


> Date: Fri, 18 Sep 2020 20:00:49 -0700
> From: Colin Percival <cperciva at tarsnap.com>
> Subject: Re: filesystem checksum problems on AWS EC2 instances
> To: ericr <erobison at gmail.com>, freebsd-cloud at freebsd.org,
>     Kirk McKusick <mckusick at FreeBSD.org>
> 
> [Adding Kirk since this seems like a UFS issue...]
> 
> On 2020-09-16 15:15, ericr wrote:
>> On Tue, Sep 15, 2020 at 6:24 PM Colin Percival <cperciva at tarsnap.com> wrote:
>>> On 2020-09-15 14:30, ericr wrote:
>>>> Sep  1 20:50:15 <kern.crit> freebsd kernel: UFS /dev/gpt/rootfs (/)
>>>> cylinder checksum failed: cg 0, cgp: 0x9c14700e != bp: 0x27bfa3d0
>>>> Sep  1 20:50:15 <kern.crit> freebsd syslogd: last message repeated 1
>>> times
>>>> Sep  1 20:50:15 <kern.crit> freebsd kernel: UFS /dev/gpt/rootfs (/)
>>>> cylinder checksum failed: cg 7, cgp: 0x43ed3fa1 != bp: 0xe9b0182e
>>>>
>>>> and from there on, I get cylinder checksum errors pretty often.
>>>
>>> Do you get this if you launch from the non-Marketplace AMIs listed in the
>>> release announcement?
>>>   https://www.freebsd.org/releases/12.1R/announce.html
>> 
>> 
>> Yes.  I just tried both of these AMI's from the release notes:
>> us-east-1 region: ami-0de268ac2498ba33d
>> us-east-2 region: ami-0a44f10b2c6deb365
>> 
>> I got the same errors.
> 
> I've managed to reproduce this, with a filesystem which I've
> verified is clean (at least, which passes fsck) before resizing
> up to ~ 200 GB:
> 
>> root at freebsd:/usr/home/ec2-user # fsck_ufs /dev/nvd1p2 
>> ** /dev/nvd1p2
>> ** Last Mounted on /releng/12-amd64-GENERIC-release/usr/obj/usr/src/amd64.amd64/release/cw-ec2/new
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> 25701 files, 758977 used, 229774 free (9654 frags, 27515 blocks, 1.0% fragmentation)
>> 
>> ***** FILE SYSTEM IS CLEAN *****
>> root at freebsd:/usr/home/ec2-user # gpart recover /dev/nvd1
>> nvd1 recovered
>> root at freebsd:/usr/home/ec2-user # gpart resize -i 2 /dev/nvd1
>> nvd1p2 resized
>> root at freebsd:/usr/home/ec2-user # growfs -y /dev/nvd1p2
>> super-block backups (for fsck_ffs -b #) at:
>> [snip]
>> root at freebsd:/usr/home/ec2-user # fsck_ufs /dev/nvd1p2
>> ** /dev/nvd1p2
>> ** Last Mounted on 
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> CG 0: BAD CHECK-HASH 0x9c14700e vs 0xc9441f74
>> SUMMARY INFORMATION BAD
>> SALVAGE? [yn] n
>> 
>> CG 7: BAD CHECK-HASH 0xad168305 vs 0x74ba48a
>> 25701 files, 758977 used, 50019285 free (9661 frags, 6251203 blocks, 0.0% fragmentation)
>> 
>> ***** FILE SYSTEM MARKED DIRTY *****
>> 
>> ***** PLEASE RERUN FSCK *****
> 
> This seems like a bug in UFS and/or growfs, but I'm not familiar enough
> with either to say any more.
> 
> Kirk, are you aware of any issues on FreeBSD 12.1-RELEASE which can cause
> cylinder checksum errors after growfs?  (On amd64 if it matters.)  If it
> would help I can provide you with SSH access to an affected EC2 instance.
> 
> -- 
> Colin Percival
> Security Officer Emeritus, FreeBSD | The power to serve
> Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid

I have managed to reproduce a similar problem in one of my rather
ancient 12.0 bhyve images that I have lying around:

FreeBSD 12.0-STABLE (GENERIC) #5 r350458M: Sat Oct 26 21:18:51 UTC 2019

The follow patch fixes it in that instance. Could you please try this
in the EC2 instance and see if it also resolves your problem.

	Kirk McKusick

=-=-=

Index: sbin/growfs/growfs.c
===================================================================
--- sbin/growfs/growfs.c	(revision 365971)
+++ sbin/growfs/growfs.c	(working copy)
@@ -572,6 +572,7 @@ updjcg(int cylno, time_t modtime, int fsi, int fso
 		if (sblock.fs_magic == FS_UFS1_MAGIC)
 			acg.cg_old_ncyl = sblock.fs_old_cpg;
 
+		cgckhash(&acg);
 		wtfs(fsbtodb(&sblock, cgtod(&sblock, cylno)),
 		    (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag);
 		DBG_PRINT0("jcg written\n");
@@ -947,6 +948,7 @@ updcsloc(time_t modtime, int fsi, int fso, unsigne
 	 * Now write the former cylinder group containing the cylinder
 	 * summary back to disk.
 	 */
+	cgckhash(&acg);
 	wtfs(fsbtodb(&sblock, cgtod(&sblock, ocscg)),
 	    (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag);
 	DBG_PRINT0("oscg written\n");
@@ -1039,6 +1041,7 @@ updcsloc(time_t modtime, int fsi, int fso, unsigne
 	 * Write the new cylinder group containing the cylinder summary
 	 * back to disk.
 	 */
+	cgckhash(&acg);
 	wtfs(fsbtodb(&sblock, cgtod(&sblock, ncscg)),
 	    (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag);
 	DBG_PRINT0("nscg written\n");


More information about the freebsd-cloud mailing list