cksum entire dir??
Karl Vogel
vogelke+freebsd at pobox.com
Thu Sep 13 20:15:57 UTC 2012
Here's a simple, system-independent way to find duplicate files. All you
need is something to generate a digest you trust (MD5, SHA1, whatever) plus
normal Unix stuff: awk, expand, grep, join, sort, and uniq.
Generate the signatures:
me% cd ~/bin
me% find . -type f -print0 | xargs -0 md5 -r | sort > /tmp/sig1
me% cat /tmp/sig1
0287839688bd660676582266685b05bd ./mkrcs
0b97494883c76da546e3603d1b65e7b2 ./pwgen
ddbed53e795724e4a6683e7b0987284c ./authlog
ddbed53e795724e4a6683e7b0987284c ./cmdlog
fdff1fd84d47f76dbd4954c607d66714 ./dbrun
ff5e24efec5cf1e17cf32c58e9c4b317 ./tr0
Find duplicate signatures:
me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 "
2 ddbed53e795724e4a6683e7b0987284c
me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 " |
awk '{print $2}' > /tmp/sig2
Associate the duplicates with files:
me% join /tmp/sig[12]
ddbed53e795724e4a6683e7b0987284c ./authlog
ddbed53e795724e4a6683e7b0987284c ./cmdlog
If your filenames contain whitespace, you can URL-encode them, play some
games with awk, or use perl.
--
Karl Vogel I don't speak for the USAF or my company
This is really a lovely horse, I once rode her mother.
--Ted Walsh, Horse Racing Commentator
More information about the freebsd-questions
mailing list