mkisofs,cd9660 and hard links

James Long list at museum.rain.com
Sun Mar 25 06:47:23 UTC 2007


> Date: Sat, 24 Mar 2007 20:15:50 +0100 (CET)
> From: Wojciech Puchar <wojtek at tensor.gdynia.pl>
> Subject: mkisofs,cd9660 and hard links
> To: freebsd-questions at freebsd.org
> Message-ID: <20070324201201.D6725 at chylonia.3miasto.net>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> 
> i did copy of small server (taking about 3GB space) to DVD with growisofs 
> -R and using --exclude to not copy /dev etc..
> 
> worked fine.
> 
> and recovered fine, but taking much more space, because all hardlinks are 
> now separate files.
> 
> it looks like cd9660 filesystem doesn't "see" hardlinked files as 
> hardlinked, but as separate ones.
> 
> is there any program to fix it like comparing all very similar files on 
> disk and hardlinking them?

My brief analysis of this is that there's only so much that can be
done, at least programmatically.  Your DVD copy does not contain 
sufficient information to differentiate between hardlinks, apparently, 
and may not allow you to determine where softlinks used to exist, 
either.  And then there may be some files that were simply two copies 
of the same content, and should not be construed as linked files.

That said, I have done similar tasks (like deleting duplicate copies 
of files stored on two machines) by writing a shell script to 
calculate a checksum of each file on disk, then sorting the output 
based on the checksum.  Where you find duplicate checksum values, you 
likely have files that could be hard-linked to each other.  It would 
require some manual vetting of the identified duplicates to determine
whether the files are supposed to be hardlinks, symlinks or simply two 
discrete files with the same content.

This can be time-consuming for large filesystems, but for 3 Gigs,
you can just start it and walk away until it's done.

This example is rather clumsy, and if someone can show me how to do 
this without having to pipe the output into sh, I'd be edified to know 
that.  On the other hand, I often like to construct xarg lines like
this so I can see and inspect the commands that will be executed,
before actually committing to piping it into the shell.

find / -type f -print0 | xargs -0 -Ixx -n1 echo echo \$\(sha256 -q \"xx\"\) \"xx\" | sh > md5-list.out

Then use awk/sort/uniq/grep to find duplicate checksums, and determine
which files have identical checksum values.  Manually examine those 
files to determine whether they should be hardlinks, symlinks, or
remain as separate files.

Note that this necessarily excludes directories, which could be 
symlinks of other directories, such as /etc/namedb vs. 
/var/named/etc/namedb.


Jim


More information about the freebsd-questions mailing list