mkisofs,cd9660 and hard links
James Long
list at museum.rain.com
Sun Mar 25 06:47:23 UTC 2007
> Date: Sat, 24 Mar 2007 20:15:50 +0100 (CET)
> From: Wojciech Puchar <wojtek at tensor.gdynia.pl>
> Subject: mkisofs,cd9660 and hard links
> To: freebsd-questions at freebsd.org
> Message-ID: <20070324201201.D6725 at chylonia.3miasto.net>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> i did copy of small server (taking about 3GB space) to DVD with growisofs
> -R and using --exclude to not copy /dev etc..
>
> worked fine.
>
> and recovered fine, but taking much more space, because all hardlinks are
> now separate files.
>
> it looks like cd9660 filesystem doesn't "see" hardlinked files as
> hardlinked, but as separate ones.
>
> is there any program to fix it like comparing all very similar files on
> disk and hardlinking them?
My brief analysis of this is that there's only so much that can be
done, at least programmatically. Your DVD copy does not contain
sufficient information to differentiate between hardlinks, apparently,
and may not allow you to determine where softlinks used to exist,
either. And then there may be some files that were simply two copies
of the same content, and should not be construed as linked files.
That said, I have done similar tasks (like deleting duplicate copies
of files stored on two machines) by writing a shell script to
calculate a checksum of each file on disk, then sorting the output
based on the checksum. Where you find duplicate checksum values, you
likely have files that could be hard-linked to each other. It would
require some manual vetting of the identified duplicates to determine
whether the files are supposed to be hardlinks, symlinks or simply two
discrete files with the same content.
This can be time-consuming for large filesystems, but for 3 Gigs,
you can just start it and walk away until it's done.
This example is rather clumsy, and if someone can show me how to do
this without having to pipe the output into sh, I'd be edified to know
that. On the other hand, I often like to construct xarg lines like
this so I can see and inspect the commands that will be executed,
before actually committing to piping it into the shell.
find / -type f -print0 | xargs -0 -Ixx -n1 echo echo \$\(sha256 -q \"xx\"\) \"xx\" | sh > md5-list.out
Then use awk/sort/uniq/grep to find duplicate checksums, and determine
which files have identical checksum values. Manually examine those
files to determine whether they should be hardlinks, symlinks, or
remain as separate files.
Note that this necessarily excludes directories, which could be
symlinks of other directories, such as /etc/namedb vs.
/var/named/etc/namedb.
Jim
More information about the freebsd-questions
mailing list