4

I know about rdfind that can find duplicate files in two directories. But I need a similar utility that finds duplicate folders (folders that have same name and same path relative to main directories) in two main directories. Is there any utility that do this simple task?

**Example:**
$ tree
.
├── maindir1
│   ├── dir space
│   │   ├── dir1
│   │   └── dir2
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir3
│   │   └── dir5
│   └── dir4
│       └── dir6
├── maindir2
│   ├── dir space
│   │   └── dir2
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir5
│   │   └── dir6
│   ├── dir6
│   └── new\012line
├── file
└── new\012line

NOTE: In above example the only duplicate folders in first level (depth 1) are:

maindir1/dir space/ & maindir2/dir space/
maindir1/dir1/ & maindir2/dir1/
maindir1/dir2/ & maindir2/dir2/

In second level (depth 2), the only duplicate folders are:

maindir1/dir space/dir2/ & maindir2/dir space/dir2/
maindir1/dir2/new\012line/ & maindir2/dir2/new\012line/

Please note that maindir1/dir3/dir5/ and maindir2/dir5/ are not duplicates and also maindir1/dir4/dir6/ and maindir2/dir5/dir6/ are not duplicates.

PHP Learner
  • 2,948

1 Answers1

3

I don't know of any utility that is specific to directories (but things like fslint or fdupes should also list directories) but it's easy enough to script:

#!/usr/bin/env bash

## Declare $dirs and $count as associative arrays
declare -A dirs
declare -A count

find_dirs(){
    ## Make ** recurse into subdirectories
    shopt -s globstar
    for d in "$1"/**
    do
    ## Remove the top directory from the dir's path
    dd="${d#*/}"
    ## If this is a directory, and is not the top directory
    if [[ -d "$d" && "$dd" != "" ]]
    then
        ## Count the number of times it's been seen
        let count["$dd"]++
        ## Add it to the list of paths with that name.
        ## I am using the `&` to separate directory entries
        dirs["$dd"]="${dirs[$dd]} & $d" 
    fi

    done
}

## Iterate over the list of paths given as arguments
for target in "$@"
do
    ## Run the find_dirs function on each of them
    find_dirs "$target"
done

## For each directory found by find_dirs
for d in "${!dirs[@]}"
do
    ## If this name has been seen more than once
    if [[ ${count["$d"]} > 1 ]]
    then
    ## Print the name with pretty colors
    printf '\033[01;31m+++ NAME: "%s" +++\033[00m\n' "$d"
    ## Print the paths with that name
    printf "%s\n" "${dirs[$d]}" | sed 's/^ & //'
    fi
done

The script above can deal with arbitrary directory names (including those with spaces or even newlines in their names) and will recurse into any number of subdirectories. For example, on this directory structure:

$ tree
.
├── maindir1
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir3
│   │   └── dir5
│   ├── dir4
│   │   └── dir6
│   └── dir space
│       ├── dir1
│       └── dir2
└── maindir2
    ├── dir1
    ├── dir2
    │   └── new\012line
    ├── dir5
    │   └── dir6
    ├── dir6
    ├── dir space
    │   └── dir2
    └── new\012line

It will return this:

Screenshot showing the script's output

terdon
  • 104,119