Re: Tool to compare directories and delete duplicate files from one directory

From: Kaya Saman <kayasaman_at_optiplex-networks.com>
Date: Thu, 18 May 2023 09:53:01 UTC
On 5/18/23 01:35, David Christensen wrote:
> On 5/17/23 00:55, Kaya Saman wrote:
>>
>> On 5/15/23 23:26, Sysadmin Lists wrote:
>>>> ----------------------------------------
>>>> From: David Christensen <dpchrist@holgerdanske.com>
>>>> Date: May 15, 2023, 1:43:38 AM
>>>> To: <questions@freebsd.org>
>>>> Subject: Re: Tool to compare directories and delete duplicate files 
>>>> from one directory
>>>>
>>>>
>>>> I looks like your script only finds duplicates when the subpath is
>>>> identical (?):
>>>>
>>> Yeah. Wasn't that the original problem description? I went off the 
>>> example
>>> given by Paul earlier in this thread, and it looked like only files 
>>> with
>>> matching subpaths were being considered (because the OP accidentally 
>>> rsync'd
>>> files from a source to a bunch of destination dirs).
>>>
>>
>> Glad to see this thread has turned into an interesting discussion....
>>
>>
>> Just as the OP :-) I will clarify....
>>
>> There was no accidental rsync in place.
>>
>>
>> Due to lack of storage my files where basically all over the place on 
>> different zpools. The problem is that most of those were on iscsi 
>> drives (all running Freebsd), so I needed to get them in a single 
>> place. Of course as the files where all over things became a mess.
>>
>> I bought a few new drives and created a new zpool just for this case. 
>> So virtually I had to sync the multiple directories to a single 
>> destination. *but* of course I didn't use the --remove-source-files 
>> option as I didn't want things to be destructive.
>>
>>
>> But then I needed the extra space too and that's where this post came 
>> from.
>>
>>
>> Regards,
>>
>>
>> Kaya
>
>
> I seem to recall that you decided to run a Perl script posted by a 
> reader.  How has that worked out?


Very well.


>
>
> My first response presupposed that you wanted to delete /dir1, /dir2, 
> and /dir3.  Further messages indicated that you wanted to keep those 
> directories and any unique files they contain.  Please clarify your 
> plans for those directories and their contents.


Nope..... I wanted to delete the duplicate files within /dir1/path... 
/dir2/path... and /dir3/path.... while keeping any files that differ.


>
>
> How do you plan to validate the consolidation process when it is 
> complete?


The consolidation process is already finished. Rsync already took care 
of that. I used: rsync -avvc --progress --ignore-existing src dst


The script I was given then simply deleted the duplicates from the 
source directories <- in fact this is really specific to me; as I just 
wanted to make my life easier in order to find the files that have the 
same names but different sizes.


Now that I have only the different files left, I can merge them by 
changing the directory name and adding a .1 or so to the end and then 
simply rsync those directories over in addition.


Again, it's just a really specific use case for this particular merge to 
me at the moment.


>
>
> David
>
>

Regards,


Kaya