Re: Tool to compare directories and delete duplicate files from one directory

From: Paul Procacci <pprocacci_at_gmail.com>
Date: Fri, 05 May 2023 02:08:22 UTC
There are multiple reasons why it may not work.  My guess is because the
potential for characters that could be showing up within the filenames and
whatnot.

This can be solved with an interpreted language that's a bit more forgiving.
Take the following perl script.  It does the same thing as the shell script
(almost).  It renames the source file instead of making a copy of it.

run as:  ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x

###################################################################################

#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
  my ($ret) = shift;
  my ($msg) = shift // "$0 dir_base dir\n";
  print $msg;
  exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base = $ARGV[0];
my $dir  = $ARGV[1];

msgDie(1, "base directory doesn't exist\n") unless -d $base;
msgDie(1, "source directory doesn't exist\n") unless -d $dir;

opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
  next if($_ eq '.' || $_ eq '..');
  if( ! -f "$base/$_" ){
    rename("$dir/$_", "$base/$_");
    next;
  }

  my ($ref) = (stat("$base/$_"))[7];
  my ($src) = (stat("$dir/$_"))[7];
  unlink("$dir/$_") if($ref == $src);
}
###################################################################################

~Paul

On Thu, May 4, 2023 at 9:32 PM Kaya Saman <kayasaman@optiplex-networks.com>
wrote:

>
> On 5/5/23 01:13, Paul Procacci wrote:
> > #!/bin/sh
> >
> > #
> > # dir_1, dir_2, and dir_3 are the directories I want to search through.
> > for i in dir_1 dir_2 dir_3;
> > do
> >   # Retrieve the filenames within each of those directories
> >   ls $i/ | while read file;
> >   do
> >      If the file doesn't exist in the base dir, copy it and continue
> > with the top of the loop.
> >     [ ! -f dir_base/$file ] && cp $i/$file dir_base/ && continue
> >
> >     #
> >     # Getting to this point means the file eixsts in both locations.
> >     #
> >
> >     # Get the file size as it is in the dir_base
> >     ref=`stat -f '%z' dir_base/$file`
> >
> >     # Get the file size as it is in $i
> >     src=`stat -f '%z' $i/$file`
> >
> >     # If the sizes are the same, remove the file from the source
> directory
> >     [ $ref -eq $src ] && rm -f $i/file
> >
> >   done
> > done
>
>
> Thanks so much!
>
>
> just a quick question... you have dir_base written in the script. Do I
> need to define this or is this part of the shell language itself?
>
>
> Right now I have modifed the script to make it non destructive so that
> it doesn't do any copying or removing yet... call it a test instance if
> you like. I personally prefer doing things like this so I don't have any
> accidents and loose things in the meantime...
>
>
> So my initial modification is this:
>
>
> > #!/bin/sh
> >
> > #
> > # dir_1, dir_2, and dir_3 are the directories I want to search through.
> > for i in /dir_1 /dir_2 /dir_3;
> > do
> >   # Retrieve the filenames within each of those directories
> >   ls $i/ | while read file;
> >   do
> >     # If the file doesn't exist in the base dir, copy it and continue
> > with the top of the loop.
> >     [ ! -f dir_base/$file ] && ls $i/$file && continue
> >
> >     #
> >     # Getting to this point means the file eixsts in both locations.
> >     #
> >
> >     # Get the file size as it is in the dir_base
> >     ref=`stat -f '%z' dir_base/$file`
> >
> >     # Get the file size as it is in $i
> >     src=`stat -f '%z' $i/$file`
> >
> >     # If the sizes are the same, remove the file from the source
> directory
> >     [ $ref -nq $src ] && ls $i/file > /tmp/file
> >
> >   done
> > done
>
>
> If this works it should just output the different files into a file
> called "file" under /tmp
>
>
> Ok, this didn't work at all.... it just listed a whole bunch of top
> level folders and didn't recurse through them :-(
>
>
> I ran it on the assumption that I needed to run the script under /dir
> and that dir_base was a shell function which would essentially be /dir/.
>
>
> [EDIT]
>
>
> Currently, I managed to get it partly running by modifying ls to use ls
> -R *but* I think that the 'stat' statements don't allow for recursion?
>
>
> The script is running as I type this but it's most likely just
> outputting a whole bunch of ls information... as I see many 'stat'
> errors in the shell output.
>
>
>

-- 
__________________

:(){ :|:& };: