Re: What is the best way to look for a lost file in the disk blocks

From: Ian Smith <smithi_at_nimnet.asn.au>
Date: Wed, 10 Aug 2022 18:05:10 UTC
On 10 August 2022 5:26:27 pm AEST, Matthias Apitz <guru@unixarea.de> wrote:
 > El día miércoles, agosto 10, 2022 a las 07:18:03a. m. +0200, Michael
 > Schuster escribió:
 > 
 > > On Wed, Aug 10, 2022 at 3:55 AM David Christensen
 > > <dpchrist@holgerdanske.com> wrote:
 > > >
 > > > On 8/9/22 05:23, Matthias Apitz wrote:
 > > > >
 > > > > Hello,
 > > > >
 > > > > Last night I damaged a plain UTF-8 HTML file (I copied by
 > accident a
 > > > > JPEG file over it) and it turned out that the backup was done a
 > month
 > > > > ago. I learned my lesson from this re/ doing backups more often
 > of files
 > > > > I'm working on...
 > 
 > Thanks for the hints.
 > 
 > The file in question is my diary, written in Spanish and every day
 > is headed by a line like
 > 
 > <dt><b>Viernes, 29 de julio de 2022 </b>
 > 
 > So I wrote a 35 line C-programm reading any 1024 byte block from the
 > device, terminate it with '\0' to make sure that a 
 > 
 > char *p = strstr(block, " de 2022 </b>");
 > 
 > would not fail, and with p != NULL I printed with printf(p-16);
 > the diary entry; and the
 > current block number to be used in dd(1) later.
 > It finds all the lines of this year, but not the missing between July
 > 10
 > and August 1 :-(
 > So the blocks have been lost. I was hoping that UFS puts them back to
 > free block chains for later use, but it seems that
 > the 'cp picture.jpg diary.html' directly overwrote the used blocks.
 > 
 > Lesson learned. I'm attaching the C-pgm, maybe someone can use it or
 > at
 > least its idea.
 > 
 > 	matthias
 
"Necessity is the mother of Invention" alright.  A neat solution.

Could any other files written since have reused those blocks?  I'm a little surprised if the cp did that ...

FWIW, I was about to offer a different method that came from my own need - finding a small but rare string in the 12.3-RELEASE dvd1.iso to be replaced, so that the 2+GiB of included packages may be installed - after 3 patches to bsdconfig, but that's another story - so I'll share it as it could be used on each (say) 10MiB block dd'd from a disk or partition as well.  play.iso is a copy of the 4.1GiB dvd1.iso

<code>
smithi@t430s:/home/dvds % strings -an7 -td play.iso | grep -i2 'pkg.txz'

2442269512 sod.J{++I
2442271727 %R:*lAS
2442277052      PKG.TXZ;1PX,
2442277146 pkg-1.17.2.txzNM
2442277165 pkg.txz
2442278912 version = 2;
2442278925 packing_format = "txz";
--
4377882256 Signature type %s is not supported for bootstrapping.
4377882310 %s/%s.pubkeysig.XXXXXX
4377882333 pkg.txz
4377882341 Invalid configuration format, ignoring the configuration fi
4377882420 Consider changing PACKAGESITE or installing it from ports:
4377882498 REPOS_DIR
4377882508 asprintf
4377882517 Path to pkg.txz required
4377882543 %s/trusted
4377882556 A pre-built version of pkg could not be found for your syst
--
4466242378 pistrings
4466242388 pkg.conf
4466242397 pkg.txz
4466242410 plasma_saver
4466242423 plasma_saver.ko
</code>

The numbers are byte offsets into the .iso file. -n7 is the size of the string I was after; increase if hunting a longer string.

Something to consider - in a general case, probably not yours - is that the desired string/s might be split over adjacent blocks, requiring some overlap of perhaps a few kb.

cheers, Ian