Re: Tool to compare directories and delete duplicate files from one directory

From: Kaya Saman <kayasaman_at_optiplex-networks.com>
Date: Fri, 05 May 2023 03:20:23 UTC
On 5/5/23 04:01, Paul Procacci wrote:
> On Thu, May 4, 2023 at 10:30 PM Kaya Saman 
> <kayasaman@optiplex-networks.com> wrote:
>
>
>     On 5/5/23 03:08, Paul Procacci wrote:
>>     There are multiple reasons why it may not work.  My guess is
>>     because the potential for characters that could be showing up
>>     within the filenames and whatnot.
>>
>>     This can be solved with an interpreted language that's a bit more
>>     forgiving.
>>     Take the following perl script.  It does the same thing as the
>>     shell script (almost).  It renames the source file instead of
>>     making a copy of it.
>>
>>     run as:  ./test.pl <http://test.pl> /absolute/path/to/master_dir
>>     /absolute_path_to_dir_x
>>
>>     ###################################################################################
>>
>>     #!/usr/bin/env perl
>>
>>     use strict;
>>     use warnings;
>>
>>     sub msgDie
>>     {
>>       my ($ret) = shift;
>>       my ($msg) = shift // "$0 dir_base dir\n";
>>       print $msg;
>>       exit($ret);
>>     }
>>
>>     msgDie(1) unless(scalar @ARGV eq 2);
>>
>>     my $base = $ARGV[0];
>>     my $dir  = $ARGV[1];
>>
>>     msgDie(1, "base directory doesn't exist\n") unless -d $base;
>>     msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>>
>>     opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
>>     while(readdir $dh)
>>     {
>>       next if($_ eq '.' || $_ eq '..');
>>       if( ! -f "$base/$_" ){
>>         rename("$dir/$_", "$base/$_");
>>         next;
>>       }
>>
>>       my ($ref) = (stat("$base/$_"))[7];
>>       my ($src) = (stat("$dir/$_"))[7];
>>       unlink("$dir/$_") if($ref == $src);
>>     }
>>     ###################################################################################
>>
>>     ~Paul
>>
>>
>
>     This didn't seem to work :-(
>
>
>     What exactly happened is this:
>
>
>     I created a set of test directories in /tmp
>
>
>     So, I have /tmp/test1 and /tmp/test2
>
>
>     to mimic the structure of the directories I intend to run this
>     thing I did this:
>
>
>     create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
>     /tmp/test2/dupdir contains these files: dup and dup1
>
>
>     /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
>
>
>     However*, now things get interesting as dup from test1 contains
>     "1234567" and dup from test2 contains "111" <- this is to simulate
>     the file size difference.
>
>
>
>
>
>
> Worked for me!  Regardless.  Use rsync then.
>
> rsync --ignore-existing --remove-source-files  /src /dest
> |This would at the very least move non-existent files from the source 
> over to the dest AND remove those source files AFTER the transfer 
> happens. |
> |You'll be 1/2 way there doing that. What you'll be left with are file 
> that exist in BOTH src AND DEST. |
> |~Paul |


Paul, I think we've got wires crossed....


I *have* already performed the rsync. Apologies if I wasn't clear!


The problem I am faced with is that the destination directory is already 
populated with the information from 3 source directories.


I need to remove the sync'ed files in the source directories and leave 
files that match in name but are of different sizes.


The problem is I can't use rsync again for this as there aren't any 
options to simply compare files based on size. I can't use the 
--existing option as the files exist in both directories....


This is the dilemma I am facing:


ls -l /merged_dir/folder/

234904506 - file 'a'


ls -l /source_dir/folder/

1080918146 - file 'a'


so in this case file 'a' is in both directories with the same name but 
different size. I need to keep both versions. However, *if* they were 
the same size then remove the file in the source_dir.....


That's all.. I don't need to transfer anything or copy anything at 
all... just compare and remove files of same name and size.


Hopefully I am explaining better and things are more clear? Again I 
apologize for the confusion  :-(