Re: Tool to compare directories and delete duplicate files from one directory

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Sat, 06 May 2023 20:33:14 UTC
I thought I sent this, but it never hit the list (?) -- David


On 5/4/23 21:06, Kaya Saman wrote:

> To start with this is the directory structure:
> 
> 
>   ls -lhR /tmp/test1
> total 1
> drwxr-xr-x  2 root  wheel     3B May  5 04:57 dupdir1
> drwxr-xr-x  2 root  wheel     3B May  5 04:57 dupdir2
> 
> /tmp/test1/dupdir1:
> total 1
> -rw-r--r--  1 root  wheel     8B Apr 30 03:17 dup
> 
> /tmp/test1/dupdir2:
> total 1
> -rw-r--r--  1 root  wheel     7B May  5 03:23 dup1
> 
> 
> ls -lhR /tmp/test2
> total 1
> drwxr-xr-x  2 root  wheel     3B May  5 04:56 dupdir1
> drwxr-xr-x  2 root  wheel     3B May  5 04:56 dupdir2
> 
> /tmp/test2/dupdir1:
> total 1
> -rw-r--r--  1 root  wheel     4B Apr 30 02:53 dup
> 
> /tmp/test2/dupdir2:
> total 1
> -rw-r--r--  1 root  wheel     7B Apr 30 02:47 dup1
> 
> 
> So what I want to happen is the script to recurse from the top level 
> directories test1 and test2 then expected behavior should be to remove 
> file dup1 as dup is different between directories.


My previous post missed the mark, but I have been watching this thread 
with interest (trepidation?).


I think Tim already identified a tool that will safely get you close to 
your goal, if not all the way:

On 5/4/23 09:28, Tim Daneliuk wrote:
> I've never used it, but there is a port of fdupes in the ports tree.
> Not sure if it does exactly what you want though.


fdupes(1) is also available as a package:

2023-05-04 21:25:31 toor@vf1 ~
# freebsd-version; uname -a
12.4-RELEASE-p2
FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD 
12.4-RELEASE-p1 GENERIC  amd64

2023-05-04 21:25:40 toor@vf1 ~
# pkg search fdupes
fdupes-2.2.1,1                 Program for identifying or deleting 
duplicate files


Looking at the man page:

https://man.freebsd.org/cgi/man.cgi?query=fdupes&sektion=1&manpath=FreeBSD+13.2-RELEASE+and+Ports


I am fairly certain that you will want to give the destination directory 
as the first argument and the source directories after that:

$ fdupes --recurse /dir /dir_1 /dir_2 /dir_3


The above will provide you with information, but not delete anything.


Practice under /tmp to gain familiarity with fdupes(1) is a good idea.


As you are using ZFS, I assume you know how to take snapshots and do 
rollbacks (?).  These could serve as backup and restore operations if 
things go badly.


Given a 12+ TB of data, you may want the --noprompt option when you do 
give the --delete option and actual arguments,


David