Grepping though a disk

Mon Mar 4 10:15:58 UTC 2013

On 3/3/2013 6:36 PM, Polytropon wrote:
> Due to a fsck file system repair I lost the content of a file
> I consider important, but it hasn't been backed up yet. The
> file name is still present, but no blocks are associated
> (file size is zero). I hope the data blocks (which are now
> probably marked "unused") are still intact, so I thought
> I'd search for them because I can remember specific text
> that should have been in that file.
>
> As I don't need any fancy stuff like a progress bar, I
> decided to write a simple command, and I quickly got
> something up and running which I _assume_ will do what
> I need.
>
> This is the command I've been running interactively in bash:
>
> 	$ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} 2>/dev/null | grep "<PATTERN>"; if [ $? -eq 0 ]; then break; fi; N=`expr ${N} + 1`; done
>
> To make it look a bit better and illustrate the simple
> logic behind my idea:
>
> 	N=0
> 	while true; do
> 		echo "${N}"
> 		dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \
> 			2>/dev/null | grep "<PATTERN>"
> 		if [ $? -eq 0 ]; then
> 			break
> 		fi
> 		N=`expr ${N} + 1`
> 	done
>
> Here <PATTERN> refers to the text. It's only a small, but
> very distinctive portion. I'm searching in blocks of 10 kB
> so it's easier to continue in case something has been found.
> I plan to output the resulting "block" (it's not a real disk
> block, I know, it's simply a unit of 10 kB disk space) and
> maybe the previous and next one (in case the file, the _real_
> block containing the data, has been split across more than
> one of those units. I will then clean the "garbage" (maybe
> from other files) because I can easily determine the beginning
> and the end of the file.
>
> Needless to say, it's a _text_ file.
>
> I understand that grep operates on text files, but it will
> also happily return 0 if the text to search for will appear
> in a binary file, and possibly return the whole file as a
> search result (in case there are no newlines in it).
>
> My questions:
>
> 1. Is this the proper way of stupidly searching a disk?
>
> 2. Is the block size (bs= parameter to dd) good, or should
>     I use a different value for better performance?
>
> 3. Is there a program known that already implements the
>     functionality I need in terms of data recovery?
>
> Results so far:
>
> The disk in question is a 1 TB SATA disk. The command has
> been running for more than 12 hours now and returned one
> false-positive result, so basically it seems to work, but
> maybe I can do better? I can always continue search by
> adding 1 to ${N}, set it as start value, and re-run the
> command.
>
> Any suggestion is welcome!
>
>
>

I'd call bs= essential for speed.  Any copying will be faster with 
something higher.  Also, there's the possibility, very annoying, that 
your search string overlaps a place where you read.  I'd probably check 
1M blocks, but advance maybe 950k each time.  Make sure you're reading 
from block offsets for maximum speed.  I know disk editors exist, I 
remember using one on Mac OS 8.6 for find a lost file.  That was back on 
a 6 gig hard drive.

Depending on the file size, you could open the disk in vi and just 
search from there, or just run strings on the disk and pipe it to vi.