Disk block or sector to file mapping?
koitsu at FreeBSD.org
Thu Jun 14 04:08:48 UTC 2007
On Wed, Jun 13, 2007 at 11:14:03PM -0400, Matthew Hagerty wrote:
> I have a drive that failed and fsck and dump both report the failed sector
> or block (the term seems to be used interchangeably at times), but how can I
> find out what file(s) were using that block? I have a file-based backup and
> I could possibly replace the bad files if I know which ones were affected by
> the bad blocks.
There's apparently a way to work out what block on the disk is used
by a specific inode using some math and numerous parameters taken
from the drive, filesystems, and other such things. It might be
mentioned in the URL I've included below (for Linux though, not BSD),
so I'd peek there.
Anyways, I'd do the following:
* Run the disk manufacturer's native disk analysis utility. Many of
them will do some extra magic (particularly for PATA/SATA disks; with
SCSI there's no magic, you can do it yourself by manipulating the grown
defect list) to try and work around a full bad block/remapped sector
list. Besides, when RMA'ing the disk, the manu. will usually ask if
you've run their analysis tool and what the result was.
* You might be able to use smartctl (ports/sysutils/smartmontools) to
run a selective LBA test (smartctl -t select,X-Y /dev/adN, where X-Y are
starting and ending LBAs to do checks on). Not all drives support this
though. If select isn't permitted, you can try -t long which should
work on most disks, but scans the entire disk (takes a long time). Then
you can use smartctl -a /dev/adN and see if the last test you ran was
successful or if an error was encountered, hopefully what LBA it's at.
This document might also come in handy:
* There's also ports/sysutils/drivecheckd which I've never used, but
looks like it might possibly provide more detailed info.
* The purpose of doing any of the above is to try and get the drive
mark the block in question as bad, thus not access it any longer. It
may have already done that when the OS reported an issue. That
should (hopefully) cause fsck to notice inconsistencies in filesystem
data, and give you a filename that used the aforementioned block,
telling you the file is inaccessible or should move to lost+found and so
on. (I'm sure someone will correct me on the last part :) )
* Now try fsck -f on each unmounted filesystem and see if any errors
come up, with filenames referenced.
Realistically, what we need on FreeBSD is a tool similar to Solaris's
format(8) "analyze" command, which does a raw disk scan (r, r/w, and a
couple other operations). For those not familiar with it, I'll include
a sample session of a disk being analysed at the bottom of this Email.
Sorry if this is too verbose, but I quite often deal with disks going
bad during my day job.
 - If the OS is seeing bad blocks on a PATA/SATA disk, usually it means
that the internal remapping table is full, which means that there were
other bad blocks on the disk which it has silently remapped for you to
avoid pain -- and space for those blocks has been exhausted. Sometimes
you can work around this as mentioned, but most of the time you can't,
and you're stuck simply replacing the disk entirely. Bad blocks have a
tendency to spread too...
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <DEFAULT cyl 4464 alt 2 hd 255 sec 63>
/pci at 0,0/pci8086,2543 at 2/pci8086,1460 at 1d/pci9005,ffff at 4/sd at 0,0
Specify disk (enter its number): 0
Warning: Current Disk has mounted partitions.
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
read - read only test (doesn't harm SunOS)
refresh - read then write (doesn't harm data)
test - pattern testing (doesn't harm data)
write - write then read (corrupts data)
compare - write, read, compare (corrupts data)
purge - write, read, write (corrupts data)
verify - write entire disk, then verify (corrupts data)
print - display data buffer
setup - set analysis parameters
config - show analysis parameters
!<cmd> - execute <cmd> , then return
Analyze entire disk[yes]?
Enter number of passes:
Repair defective blocks[yes]?
Stop after first error[no]? yes
Use random bit patterns[no]? yes
Enter number of blocks per transfer[126, 0/2/0]:
Verify media after formatting[yes]?
Enable extended messages[no]?
Restore defect list[yes]?
Restore disk label[yes]?
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y
Total of 0 defective blocks repaired.
More information about the freebsd-hackers