Re: Tool to compare directories and delete duplicate files from one directory

From: Kaya Saman <kayasaman_at_optiplex-networks.com>
Date: Thu, 04 May 2023 21:47:44 UTC
On 5/4/23 17:29, Paul Procacci wrote:
>
>
> On Thu, May 4, 2023 at 11:53 AM Kaya Saman 
> <kayasaman@optiplex-networks.com> wrote:
>
>     Hi,
>
>
>     I'm wondering if anyone knows of a tool like diff or so that can also
>     delete files based on name and size from either left/right or
>     source/destination directory?
>
>
>     Basically what I have done is performed an rsync without using the
>     --remove-source-files option onto a newly bought and created disk
>     pool
>     (yes zpool) that i am trying to consolidate my data - as it's
>     currently
>     spread out over multiple pools with the same folder name.
>
>
>     The issue I am facing mainly is that I perform another rsync and
>     use the
>     --remove-source-files option, rsync will delete files based on name
>     while there are some files that have the same name but not same
>     size and
>     I would like to retain these files.
>
>
>     Right now I have looked at many different options in both rsync and
>     other tools but found nothing suitable. I even tested using a few
>     test
>     dirs and files that I put into /tmp and whatever I tried, the
>     files of
>     different size either got transferred or deleted.
>
>
>     How would be a good way to approach this problem?
>
>
>     Even if I create some kind of shell script and use diff, I think
>     it will
>     only compare names and not file sizes.
>
>
>     I'm really lost here....
>
>
>     Regards,
>
>
>     Kaya
>
>
>
>
> It sounds like you want fdupes.  It's in the ports tree.
>
> ~Paul
>
> -- 
> __________________
>
> :(){ :|:& };:



I tried fdupes and installed it a while back. For me it felt like it 
only works on a single directory.


My dir structure is that I have"


/dir <- main directory where everything has now been rsync'ed to

/dir_1 <- old directory with partial content

/dir_2 <- more partial content

/dir_3 <- more partial content


The key thing here is that I need to compare:


/dir_(x) with /dir


if the files are different sizes in /dir_(x) then leave them, otherwise 
delete if both name and file size are the same.