Re: Tool to compare directories and delete duplicate files from one directory

From: Paul Procacci <pprocacci_at_gmail.com>
Date: Fri, 05 May 2023 03:01:54 UTC
On Thu, May 4, 2023 at 10:30 PM Kaya Saman <kayasaman@optiplex-networks.com>
wrote:

>
> On 5/5/23 03:08, Paul Procacci wrote:
>
> There are multiple reasons why it may not work.  My guess is because the
> potential for characters that could be showing up within the filenames and
> whatnot.
>
> This can be solved with an interpreted language that's a bit more
> forgiving.
> Take the following perl script.  It does the same thing as the shell
> script (almost).  It renames the source file instead of making a copy of it.
>
> run as:  ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x
>
> ###################################################################################
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> sub msgDie
> {
>   my ($ret) = shift;
>   my ($msg) = shift // "$0 dir_base dir\n";
>   print $msg;
>   exit($ret);
> }
>
> msgDie(1) unless(scalar @ARGV eq 2);
>
> my $base = $ARGV[0];
> my $dir  = $ARGV[1];
>
> msgDie(1, "base directory doesn't exist\n") unless -d $base;
> msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>
> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
> while(readdir $dh)
> {
>   next if($_ eq '.' || $_ eq '..');
>   if( ! -f "$base/$_" ){
>     rename("$dir/$_", "$base/$_");
>     next;
>   }
>
>   my ($ref) = (stat("$base/$_"))[7];
>   my ($src) = (stat("$dir/$_"))[7];
>   unlink("$dir/$_") if($ref == $src);
> }
>
> ###################################################################################
>
> ~Paul
>
>
>
> This didn't seem to work :-(
>
>
> What exactly happened is this:
>
>
> I created a set of test directories in /tmp
>
>
> So, I have /tmp/test1 and /tmp/test2
>
>
> to mimic the structure of the directories I intend to run this thing I did
> this:
>
>
> create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
> /tmp/test2/dupdir contains these files: dup and dup1
>
>
> /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
>
>
> However*, now things get interesting as dup from test1 contains "1234567"
> and dup from test2 contains "111" <- this is to simulate the file size
> difference.
>
>
>
>
>
>
Worked for me!  Regardless.  Use rsync then.

rsync --ignore-existing --remove-source-files  /src /dest

This would at the very least move non-existent files from the source
over to the dest AND remove those source files AFTER the transfer
happens.

You'll be 1/2 way there doing that.  What you'll be left with are file
that exist in BOTH src AND DEST.

~Paul