Re: Tool to compare directories and delete duplicate files from one directory
- In reply to: David Christensen : "Re: Tool to compare directories and delete duplicate files from one directory"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 15 May 2023 08:43:38 UTC
On 5/15/23 01:29, David Christensen wrote:
> On 5/14/23 15:48, Sysadmin Lists wrote:
>> #!/bin/sh -e
>> # remove or report duplicate files: $0 [-n] dir[1] dir[2] ... dir[n]
>> if [ "X$1" = "X-n" ]; then n=1; shift; fi
>>
>> echo "Building files list from: ${@}"
>>
>> find "${@}" -xdev -type f |
>> awk -v n=$n 'BEGIN { cmd = "stat -f %z "
>> for (x = 1; x < ARGC; x++) args = args ? args "|" ARGV[x] : ARGV[x];
>> ARGC = 0 }
>> { files[$0] = match($0, "(" args ")/?") + RLENGTH }
>> END { for (i in ARGV) sub("/*$", "/", ARGV[i])
>> print "Comparing files ..."
>> for (i = 1; i < x; i++) for (file in files) if (file ~ "^"
>> ARGV[i]) {
>> for (j = i +1; j < x; j++)
>> if (ARGV[j] substr(file, files[file]) in files) {
>> dup = ARGV[j] substr(file, files[file])
>> cmd "\"" file "\"" | getline fil_s; close(cmd "\""
>> file "\"")
>> cmd "\"" dup "\"" | getline dup_s; close(cmd "\""
>> dup "\"")
>> if (dup_s == fil_s) act("dup")
>> else act("diff") }
>> delete files[file]
>> } }
>> function act(message) {
>> print ((message == "dup") ? "duplicates:" : "difference:"), dup,
>> file
>> if (!n) system("rm -vi \"" dup "\" </dev/tty")
>> }' "${@}"
> Your script does not appear to do anything (?):
>
> 2023-05-15 01:19:00 dpchrist@vf1 /vf1zpool1/dpchrist
> $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo
> Building files list from: foo
> Comparing files ...
>
> 2023-05-15 01:19:33 dpchrist@vf1 /vf1zpool1/dpchrist
> $ ls -R1 foo | wc
> 26 24 82
>
> 2023-05-15 01:19:35 dpchrist@vf1 /vf1zpool1/dpchrist
> $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo
> Building files list from: foo
> Comparing files ...
>
> 2023-05-15 01:19:48 dpchrist@vf1 /vf1zpool1/dpchrist
> $ ls -R1 foo | wc
> 26 24 82
I looks like your script only finds duplicates when the subpath is
identical (?):
2023-05-15 01:38:20 dpchrist@vf1 /vf1zpool1/dpchrist
$ cp -Ra foo bar
2023-05-15 01:39:18 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
duplicates: bar/1/i-j foo/1/i-j
duplicates: bar/1/2/e foo/1/2/e
duplicates: bar/1/a-b foo/1/a-b
duplicates: bar/1/g foo/1/g
duplicates: bar/1/2/i foo/1/2/i
duplicates: bar/q-r foo/q-r
duplicates: bar/m-n foo/m-n
duplicates: bar/1/2/m foo/1/2/m
duplicates: bar/c foo/c
duplicates: bar/e-f foo/e-f
duplicates: bar/1/s foo/1/s
duplicates: bar/k foo/k
duplicates: bar/o foo/o
duplicates: bar/q foo/q
duplicates: bar/1/c-d foo/1/c-d
duplicates: bar/1/2/s-t foo/1/2/s-t
duplicates: bar/1/2/o-p foo/1/2/o-p
duplicates: bar/1/2/k-l foo/1/2/k-l
duplicates: bar/g-h foo/g-h
2023-05-15 01:39:41 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 foo | wc
26 24 82
2023-05-15 01:39:44 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 bar | wc
26 24 82
2023-05-15 01:40:10 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
duplicates: bar/1/i-j foo/1/i-j
duplicates: bar/1/2/e foo/1/2/e
duplicates: bar/1/a-b foo/1/a-b
duplicates: bar/1/g foo/1/g
duplicates: bar/1/2/i foo/1/2/i
duplicates: bar/q-r foo/q-r
duplicates: bar/m-n foo/m-n
duplicates: bar/1/2/m foo/1/2/m
duplicates: bar/c foo/c
duplicates: bar/e-f foo/e-f
duplicates: bar/1/s foo/1/s
duplicates: bar/k foo/k
duplicates: bar/o foo/o
duplicates: bar/q foo/q
duplicates: bar/1/c-d foo/1/c-d
duplicates: bar/1/2/s-t foo/1/2/s-t
duplicates: bar/1/2/o-p foo/1/2/o-p
duplicates: bar/1/2/k-l foo/1/2/k-l
duplicates: bar/g-h foo/g-h
2023-05-15 01:40:22 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 foo | wc
26 24 82
2023-05-15 01:40:29 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 bar | wc
26 24 82
2023-05-15 01:40:34 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
remove bar/1/2/a? n
duplicates: bar/1/i-j foo/1/i-j
remove bar/1/i-j? n
duplicates: bar/1/2/e foo/1/2/e
remove bar/1/2/e? n
duplicates: bar/1/a-b foo/1/a-b
remove bar/1/a-b? n
duplicates: bar/1/g foo/1/g
remove bar/1/g? n
duplicates: bar/1/2/i foo/1/2/i
remove bar/1/2/i? n
duplicates: bar/q-r foo/q-r
remove bar/q-r? n
duplicates: bar/m-n foo/m-n
remove bar/m-n? n
duplicates: bar/1/2/m foo/1/2/m
remove bar/1/2/m? n
duplicates: bar/c foo/c
remove bar/c? n
duplicates: bar/e-f foo/e-f
remove bar/e-f? n
duplicates: bar/1/s foo/1/s
remove bar/1/s? n
duplicates: bar/k foo/k
remove bar/k? n
duplicates: bar/o foo/o
remove bar/o? n
duplicates: bar/q foo/q
remove bar/q? n
duplicates: bar/1/c-d foo/1/c-d
remove bar/1/c-d? n
duplicates: bar/1/2/s-t foo/1/2/s-t
remove bar/1/2/s-t? n
duplicates: bar/1/2/o-p foo/1/2/o-p
remove bar/1/2/o-p? n
duplicates: bar/1/2/k-l foo/1/2/k-l
remove bar/1/2/k-l? n
duplicates: bar/g-h foo/g-h
remove bar/g-h? n
David