Re: Tool to compare directories and delete duplicate files from one directory

From: Paul Procacci <pprocacci_at_gmail.com>
Date: Thu, 04 May 2023 22:32:04 UTC
On Thu, May 4, 2023 at 5:47 PM Kaya Saman <kayasaman@optiplex-networks.com>
wrote:

>
> On 5/4/23 17:29, Paul Procacci wrote:
>
>
>
> On Thu, May 4, 2023 at 11:53 AM Kaya Saman <
> kayasaman@optiplex-networks.com> wrote:
>
>> Hi,
>>
>>
>> I'm wondering if anyone knows of a tool like diff or so that can also
>> delete files based on name and size from either left/right or
>> source/destination directory?
>>
>>
>> Basically what I have done is performed an rsync without using the
>> --remove-source-files option onto a newly bought and created disk pool
>> (yes zpool) that i am trying to consolidate my data - as it's currently
>> spread out over multiple pools with the same folder name.
>>
>>
>> The issue I am facing mainly is that I perform another rsync and use the
>> --remove-source-files option, rsync will delete files based on name
>> while there are some files that have the same name but not same size and
>> I would like to retain these files.
>>
>>
>> Right now I have looked at many different options in both rsync and
>> other tools but found nothing suitable. I even tested using a few test
>> dirs and files that I put into /tmp and whatever I tried, the files of
>> different size either got transferred or deleted.
>>
>>
>> How would be a good way to approach this problem?
>>
>>
>> Even if I create some kind of shell script and use diff, I think it will
>> only compare names and not file sizes.
>>
>>
>> I'm really lost here....
>>
>>
>> Regards,
>>
>>
>> Kaya
>>
>>
>>
>>
> It sounds like you want fdupes.  It's in the ports tree.
>
> ~Paul
>
> --
> __________________
>
> :(){ :|:& };:
>
>
>
> I tried fdupes and installed it a while back. For me it felt like it only
> works on a single directory.
>
>
> My dir structure is that I have"
>
>
> /dir <- main directory where everything has now been rsync'ed to
>
> /dir_1 <- old directory with partial content
>
> /dir_2 <- more partial content
>
> /dir_3 <- more partial content
>
>
> The key thing here is that I need to compare:
>
>
> /dir_(x) with /dir
>
>
> if the files are different sizes in /dir_(x) then leave them, otherwise
> delete if both name and file size are the same.
>

Then a tiny shell script does the job assuming your files don't have any
spaces and no weird characters exist:

#!/bin/sh

for i in b c d;
do
  ls $i/ | while read file;
  do
    [ ! -f a/$file ] && cp $i/$file a/$file && continue

    ref=`stat -f '%z' a/$file`
    src=`stat -f '%z' %i/$file`
    [ $ref -eq $src ] && rm -f $i/file

  done
done

Change paths accordingly and backup your stuff. ;)

~Paul

-- 
__________________

:(){ :|:& };: