5

I have two files containing list of all files paths from two hard drives (supposed to be exactly the same), one of which I think has missing files. Both lists have the file path and size, but the lists are not in the same order (see example below).

Is there a command which can compare the difference between the two files and output the difference to a new file?

Example:

file1:

/docs/red
/docs/blue
/docs/yellow
/docs/green

file_2:

/docs/blue
/docs/green
/docs/red

Difference_File:

/docs/yellow
αғsнιη
  • 36,350
SD_NZ
  • 91

5 Answers5

8

Use grep and no need sort them:

grep -Fxvf file2 file1 > diff_file

will return lines which are in file1 but not in file2 (lines missed in file2).

αғsнιη
  • 36,350
6

I would try using sort and diff:

$ diff <(sort csv1.txt) <(sort csv2.txt)
4d3
< 
8d6
< /docs/yellow
David Foerster
  • 36,890
  • 56
  • 97
  • 151
6

I generally use meld (which is a very useful visual diff tool) for such comparisons.

Install meld:

sudo apt-get install meld

Sort, and then compare:

sort csv1.txt > csv1-sorted.txt
sort csv2.txt > csv2-sorted.txt
meld csv1-sorted.txt csv2-sorted.txt 
3

The comm command is designed to answer this sort of question. What it does is take two sorted files as input, then output three columns of text: lines unique to file1, lines unique to file2, and lines common in both files. You can suppress any of these three columns.

In your case, you would want something like:

comm <(sort file1) <(sort file_2) -3 --output-delimiter=''

Which will compare file1 and file_2, then output whatever differences exist to standard output. Use -23 (suppress columns 2 and 3) if you only want the lines unique to file1, or -13 (suppress columns 1 and 3) if you only want the lines unique to file_2

Tacroy
  • 131
2

If your real question is how to compare two mounted file-systems I would use rsync.

See: Rsync compare directories? on Unix & Linux

You can use -n (--dry-run) to cause no files to actually be copied, then the output are the differences. This, by default will also show if one file is newer than another, i.e. whether the contents have changed. I am fairly confident that it can be configured to ignore file contents.

pa4080
  • 30,621
Zak
  • 161