Re: Tool to compare directories and delete duplicate files from one directory
Date: Fri, 05 May 2023 03:20:23 UTC
On 5/5/23 04:01, Paul Procacci wrote:
> On Thu, May 4, 2023 at 10:30 PM Kaya Saman
> <kayasaman@optiplex-networks.com> wrote:
>
>
> On 5/5/23 03:08, Paul Procacci wrote:
>> There are multiple reasons why it may not work. My guess is
>> because the potential for characters that could be showing up
>> within the filenames and whatnot.
>>
>> This can be solved with an interpreted language that's a bit more
>> forgiving.
>> Take the following perl script. It does the same thing as the
>> shell script (almost). It renames the source file instead of
>> making a copy of it.
>>
>> run as: ./test.pl <http://test.pl> /absolute/path/to/master_dir
>> /absolute_path_to_dir_x
>>
>> ###################################################################################
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings;
>>
>> sub msgDie
>> {
>> my ($ret) = shift;
>> my ($msg) = shift // "$0 dir_base dir\n";
>> print $msg;
>> exit($ret);
>> }
>>
>> msgDie(1) unless(scalar @ARGV eq 2);
>>
>> my $base = $ARGV[0];
>> my $dir = $ARGV[1];
>>
>> msgDie(1, "base directory doesn't exist\n") unless -d $base;
>> msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>>
>> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
>> while(readdir $dh)
>> {
>> next if($_ eq '.' || $_ eq '..');
>> if( ! -f "$base/$_" ){
>> rename("$dir/$_", "$base/$_");
>> next;
>> }
>>
>> my ($ref) = (stat("$base/$_"))[7];
>> my ($src) = (stat("$dir/$_"))[7];
>> unlink("$dir/$_") if($ref == $src);
>> }
>> ###################################################################################
>>
>> ~Paul
>>
>>
>
> This didn't seem to work :-(
>
>
> What exactly happened is this:
>
>
> I created a set of test directories in /tmp
>
>
> So, I have /tmp/test1 and /tmp/test2
>
>
> to mimic the structure of the directories I intend to run this
> thing I did this:
>
>
> create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
> /tmp/test2/dupdir contains these files: dup and dup1
>
>
> /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
>
>
> However*, now things get interesting as dup from test1 contains
> "1234567" and dup from test2 contains "111" <- this is to simulate
> the file size difference.
>
>
>
>
>
>
> Worked for me! Regardless. Use rsync then.
>
> rsync --ignore-existing --remove-source-files /src /dest
> |This would at the very least move non-existent files from the source
> over to the dest AND remove those source files AFTER the transfer
> happens. |
> |You'll be 1/2 way there doing that. What you'll be left with are file
> that exist in BOTH src AND DEST. |
> |~Paul |
Paul, I think we've got wires crossed....
I *have* already performed the rsync. Apologies if I wasn't clear!
The problem I am faced with is that the destination directory is already
populated with the information from 3 source directories.
I need to remove the sync'ed files in the source directories and leave
files that match in name but are of different sizes.
The problem is I can't use rsync again for this as there aren't any
options to simply compare files based on size. I can't use the
--existing option as the files exist in both directories....
This is the dilemma I am facing:
ls -l /merged_dir/folder/
234904506 - file 'a'
ls -l /source_dir/folder/
1080918146 - file 'a'
so in this case file 'a' is in both directories with the same name but
different size. I need to keep both versions. However, *if* they were
the same size then remove the file in the source_dir.....
That's all.. I don't need to transfer anything or copy anything at
all... just compare and remove files of same name and size.
Hopefully I am explaining better and things are more clear? Again I
apologize for the confusion :-(