How to clean duplicates existing in one folder from another recursively?

Question

Summary:

Folder A has only many Excelent files
Folder B has many folders of mixed Excelent/Good/Bad files

How can I delete files in Folder B folders only in case they will be in Folder A.

In other words, how to check if Folder A files exist in Folder B folders then delete from Folder B folders?

Idea of solution Maybe a command which:

Checks part of alphabet like all starting with A
Executing deletion on files found in Folder B subfolders
Repeat 1. + one alphabet up.

Reasons why other duplication programs were bad:

Long time until 1st deletion - it's only when finished scanning
And no possibility to choose deleting in Folder B. It's possible only to keep latest and something else also, but not by choosing in which folder to keep.

Useless history: Files were copied from Recuva in Folder B and partly arranged, but a lot of them are bad. So first, I'm thinking Folder B comparing if exist to those which recovered again, but now just Excelent recovered in Folder A by Recuva so most of Excelent will be just in Folder A.

Example file tree:

.
├── A
│   ├── 1.png
│   ├── 2.png
│   └── Excellent
│       ├── e1.png
│       └── e2.png
└── B
    ├── 1.png
    ├── 2.png
    ├── Bad
    │   ├── 1.png
    │   ├── 2.png
    │   ├── e1.png
    │   └── e2.png
    └── Excellent
        ├── e1.png
        └── e2.png

unutbu · Accepted Answer · 2013-01-13T11:46:25.773

Below are two solutions, depending on how we define "duplicate":

Files with the same relative path, or
Files with the same content but not necessarily the same name

If by "duplicate" we mean two files which share the same relative path, then you could use find and xargs to remove the duplicates. For example, suppose you have

~/tmp% tree A
A
└── Excellent
    ├── bar
    ├── baz
    └── foo
~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
│   ├── bar
│   ├── baz
│   └── foo
└── Good

Then

find /home/unutbu/tmp/A  -depth -type f -print0 | xargs -0 -I{} bash -c 'rm "/home/unutbu/tmp/B${1#*A}"' - {}

results in

~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
└── Good

Or, if by "duplicate" we mean two files share the same content, though perhaps not the same filename, then you could use rdfind:

sudo apt-get install rdfind

If we have this directory structure:

~/tmp% tree A
A
└── Excellent
    ├── bar
    ├── baz
    └── foo

1 directory, 3 files
~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
│   ├── barbar
│   ├── bazbaz
│   └── foofoo
└── Good

where barbar has the same content as bar, and similarly for bazbaz and foofoo, then

rdfind -deleteduplicates true A B

results in

~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
└── Good

Alternate solution in case your version of Ubuntu does not include rdfind:

You could instead use fdupes:

sudo apt-get install fdupes
fdupes --recurse --delete --noprompt A B

How to clean duplicates existing in one folder from another recursively?

1 Answers1