2

I have two file lists. backup.txt and backup2.txt Some of the entries aren't exact, so it makes it difficult to find the duplicates with diff or uniq.

Example:

:::backup.txt:::
auser_backup
auser_backup2
buser_backup
cuser_backup

:::backup2.txt:::
auser.backup
auser.backup.2
buser
cuser

I was wondering if there is a way to compare these vaguely similar file lists, where auser_backup and auser.backup along with auser_backup2 and auser.backup.2 would be counted as duplicates.

Maybe there's another step to rename all the entries so that the formats are correct? I'm kind of at a loss.

Fabby
  • 35,017
mktoaster
  • 251

1 Answers1

1

You're going to have to pre-process the files to "fix" the irregularities:

fixfile() { sed -r 's/([[:alpha:]])([[:digit:]]+)$/\1.\2/; s/\./_/g' "$1"; }
comm -12 <(fixfile backup.txt | sort) <(fixfile backup2.txt | sort)
auser_backup
auser_backup_2
glenn jackman
  • 18,218