UFS not handling errors correctly

Kris Kennaway kris at FreeBSD.org
Sun Sep 9 10:27:50 PDT 2007


Johannes Totz wrote:
> Hi!
> 
> Seems like UFS does not handle disk/write errors properly, causes silent
> corruptions and which causes a panic later during snapshot creation.
> 
>> #uname -a
>> FreeBSD alfred 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jul 12 20:40:55 CEST 2007     root at alfred:/usr/obj/usr/src/sys/ALFRED  i386
> 
> One day a write error on one of my disks happened:
> 
>> Aug 22 05:24:39 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=469004995
>> Aug 22 05:24:40 alfred kernel: ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=469004995
>> Aug 22 05:24:40 alfred kernel: g_vfs_done():ufs/home[READ(offset=240130525184, length=2048)]error = 5
>> Aug 22 05:25:08 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=490974155
>> Aug 22 05:25:08 alfred kernel: ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=490974155
>> Aug 22 05:25:08 alfred kernel: g_vfs_done():ufs/home[READ(offset=251378735104, length=2048)]error = 5
> 
> This has never happened before and did not happen again (yet). A long
> test with smartctl reports "all fine". So lets attribute that to a
> cosmic ray (or neutrino, pick your favorite) hitting the controller.
> 
> The system continued to run fine afterwards.
> But: next morning during automatic snapshot creation it panic'ed with:
> 
>> Aug 23 06:38:14 alfred kernel: ffs_snapshot_mount: old format snapshot inode 8
>> Aug 23 06:38:14 alfred savecore: reboot after panic: snapacct_ufs2: bad block
> 
> So of course it restarted. And tried to do a background fsck. And failed
> again... and again... and again...
> 
>> Aug 23 07:08:15 alfred kernel: ffs_snapshot_mount: old format snapshot inode 4
>> Aug 23 07:08:15 alfred savecore: reboot after panic: snapacct_ufs2: bad block
> 
> The report inode varies but "bad block" is always the same.
> So this went on for about 10x until I had a chance to interrupt it (i.e.
> woke from slumber) and boot into single user mode.
> Multiple runs of fsck fixed the problem. Deleted all old snapshot files
> and system is fine. No further problems. Maybe some files got lost;
> can't tell, there are a few million on it.
> 
> Also see:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114676
> 
> Unfortunately I don't have time to dig into this. But I wanted to report
> it. Maybe someone already fixed it...

bg fsck cannot fix arbitrary filesystem corruption.  Nor is it intended to.

Kris



More information about the freebsd-fs mailing list