Re: Tool to compare directories and delete duplicate files from one directory
Date: Fri, 05 May 2023 02:30:14 UTC
On 5/5/23 03:08, Paul Procacci wrote:
> There are multiple reasons why it may not work. My guess is because
> the potential for characters that could be showing up within the
> filenames and whatnot.
>
> This can be solved with an interpreted language that's a bit more
> forgiving.
> Take the following perl script. It does the same thing as the shell
> script (almost). It renames the source file instead of making a copy
> of it.
>
> run as: ./test.pl <http://test.pl> /absolute/path/to/master_dir
> /absolute_path_to_dir_x
>
> ###################################################################################
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> sub msgDie
> {
> my ($ret) = shift;
> my ($msg) = shift // "$0 dir_base dir\n";
> print $msg;
> exit($ret);
> }
>
> msgDie(1) unless(scalar @ARGV eq 2);
>
> my $base = $ARGV[0];
> my $dir = $ARGV[1];
>
> msgDie(1, "base directory doesn't exist\n") unless -d $base;
> msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>
> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
> while(readdir $dh)
> {
> next if($_ eq '.' || $_ eq '..');
> if( ! -f "$base/$_" ){
> rename("$dir/$_", "$base/$_");
> next;
> }
>
> my ($ref) = (stat("$base/$_"))[7];
> my ($src) = (stat("$dir/$_"))[7];
> unlink("$dir/$_") if($ref == $src);
> }
> ###################################################################################
>
> ~Paul
>
>
This didn't seem to work :-(
What exactly happened is this:
I created a set of test directories in /tmp
So, I have /tmp/test1 and /tmp/test2
to mimic the structure of the directories I intend to run this thing I
did this:
create a subdir called: dupdir in /tmp/test1 and /tmp/test2
/tmp/test2/dupdir contains these files: dup and dup1
/tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
However*, now things get interesting as dup from test1 contains
"1234567" and dup from test2 contains "111" <- this is to simulate the
file size difference.
I then ran: ./test.pl /tmp/test1 /tmp/test2
The expected behavior is that I should retain the file 'dup' in test1
while 'dup1' should be removed.
In my actual file system I have many of these subdirs, so a fair test
would probably be something like creating:
/tmp/test1/dupdir1
/tmp/test2/dupdir1
/tmp/test1/dupdir2
/tmp/test2/dupdir2
then putting the file dup into dupdir1 and dup1 into dupdir2
I guess my issue is complex?? If I only I had used the
--remove-source-files option during my initial rsync then I wouldn't
have had to worry about any of this since I used the --ignore-existing
option so that would have done the trick initially, but I decided to
play safe instead and now ended up with a slight headache on my hands.