4

How can I compare data in 2 files to identify common and unique data ? I can't do it line by line because I have file 1 which contains say 100 id/codes/number-set and I want to compare a file 2 to file 1.

The thing is that file 2 contains a subset of data in file 1 and also data unique to file 2, for example:

file 1      file 2
1            1
2            a
3            2
4            b
5            3 
6            c

How can I compare both files to identify data that is common and unique to each files? diff can't seem to do the job.

steeldriver
  • 142,475

3 Answers3

7

No matter if your file1 and file2 are sorted or not, use command as follows:

unique data in file1:

awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
4
5
6

unique data in file2:

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2
a
b
c

common data:

awk 'NR==FNR{a[$0];next} ($0 in a)' file1 file2
1
2
3

Explanation:

NR==FNR    - Execute next block for 1st file only
a[$0]      - Create an associative array with key as '$0' (whole line) and copy that into it as its content.
next       - move to next row
($0 in a)  - For each line saved in `a` array:
             print the common lines from 1st and 2nd file "($0 in a)' file1 file2"
             or unique lines in 1st file only "!($0 in a)' file2 file1"
             or unique lines in 2nd file only "!($0 in a)' file1 file2"
αғsнιη
  • 36,350
5

This is what comm is for:

$ comm <(sort file1) <(sort file2)
        1
        2
        3
4
5
6
    a
    b
    c

The first column is lines only appearing in file 1
The second column is lines only appearing in file 2
The third column is lines common to both files

comm requires the input files to be sorted

To exclude any column from appearing, add an option with that column number. For example, to see only the lines in common, use comm -12 ... or the lines that are only in file2, comm -13 ...

glenn jackman
  • 18,218
0

xxdiff is unmatched if you just need to graphically see the changes between two files (or directories!):

enter image description here

Like regular diff and comm, your input files should be sorted first.

sort file1.txt > file1.txt.sorted
sort file2.txt > file2.txt.sorted
xxdiff file1.txt.sorted file2.txt.sorted
αғsнιη
  • 36,350