Re: Tool to compare directories and delete duplicate files from one directory

From: Kaya Saman <kayasaman_at_optiplex-networks.com>
Date: Sun, 07 May 2023 20:25:18 UTC
On 5/6/23 21:33, David Christensen wrote:
> I thought I sent this, but it never hit the list (?) -- David
>
>
> On 5/4/23 21:06, Kaya Saman wrote:
>
>> To start with this is the directory structure:
>>
>>
>>   ls -lhR /tmp/test1
>> total 1
>> drwxr-xr-x  2 root  wheel     3B May  5 04:57 dupdir1
>> drwxr-xr-x  2 root  wheel     3B May  5 04:57 dupdir2
>>
>> /tmp/test1/dupdir1:
>> total 1
>> -rw-r--r--  1 root  wheel     8B Apr 30 03:17 dup
>>
>> /tmp/test1/dupdir2:
>> total 1
>> -rw-r--r--  1 root  wheel     7B May  5 03:23 dup1
>>
>>
>> ls -lhR /tmp/test2
>> total 1
>> drwxr-xr-x  2 root  wheel     3B May  5 04:56 dupdir1
>> drwxr-xr-x  2 root  wheel     3B May  5 04:56 dupdir2
>>
>> /tmp/test2/dupdir1:
>> total 1
>> -rw-r--r--  1 root  wheel     4B Apr 30 02:53 dup
>>
>> /tmp/test2/dupdir2:
>> total 1
>> -rw-r--r--  1 root  wheel     7B Apr 30 02:47 dup1
>>
>>
>> So what I want to happen is the script to recurse from the top level 
>> directories test1 and test2 then expected behavior should be to 
>> remove file dup1 as dup is different between directories.
>
>
> My previous post missed the mark, but I have been watching this thread 
> with interest (trepidation?).
>
>
> I think Tim already identified a tool that will safely get you close 
> to your goal, if not all the way:
>
> On 5/4/23 09:28, Tim Daneliuk wrote:
>> I've never used it, but there is a port of fdupes in the ports tree.
>> Not sure if it does exactly what you want though.
>
>
> fdupes(1) is also available as a package:
>
> 2023-05-04 21:25:31 toor@vf1 ~
> # freebsd-version; uname -a
> 12.4-RELEASE-p2
> FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD 
> 12.4-RELEASE-p1 GENERIC  amd64
>
> 2023-05-04 21:25:40 toor@vf1 ~
> # pkg search fdupes
> fdupes-2.2.1,1                 Program for identifying or deleting 
> duplicate files
>
>
> Looking at the man page:
>
> https://man.freebsd.org/cgi/man.cgi?query=fdupes&sektion=1&manpath=FreeBSD+13.2-RELEASE+and+Ports 
>
>
>
> I am fairly certain that you will want to give the destination 
> directory as the first argument and the source directories after that:
>
> $ fdupes --recurse /dir /dir_1 /dir_2 /dir_3
>
>
> The above will provide you with information, but not delete anything.
>
>
> Practice under /tmp to gain familiarity with fdupes(1) is a good idea.
>
>
> As you are using ZFS, I assume you know how to take snapshots and do 
> rollbacks (?).  These could serve as backup and restore operations if 
> things go badly.
>
>
> Given a 12+ TB of data, you may want the --noprompt option when you do 
> give the --delete option and actual arguments,
>
>
> David
>

Thanks David!


I tried using fdupes like this but I wasn't able to see anything. 
Probably because it took so long to run and never completed? It does 
actually feature a -d flag too which does delete stuff but from my 
testing this deletes all duplicates and doesn't allow you to choose the 
directory to delete the duplicate files from, unless I failed to 
understand the man page.


At present the Perl script from Paul in it's last iteration solved my 
problem and was pretty fast at the same time.


Of course at first I tested it on my test dirs in /tmp, then I took zfs 
snapshots on the actual working dirs and finally ran the script. It 
worked flawlessly.


Regards,


Kaya