Yesterday I asked this question, and got awesome answers, it's really a joy to ask questions on this site.
Today I got a slightly different question
say I have csv1
1,2,3,4
5,6,7,8 --
9,10,11,12
13,14,15 --
and csv2 has
1,2,3,4,5 --
20,21,22,23,24
24,25,26,27,28
9,10,11,12,30 --
45,46,47,48,60
How can I print only those rows whose 1st 4 fields are only present in one of the two files? In other words, discard all lines from each file whose 1st four fields are also present in a line in the other file.
1,2,3,4
9,10,11,12
20,21,22,23,24
24,25,26,27,28
45,46,47,48,60
Note that -- doesn't exist in the actual files, i added them to help you notice the difference.
So far, I'm loading everything in numpy arrays and comparing each element,
if a[i] == b[i] and ...
But I want to know if there's a better way to do it using Linux tools.
Edit
Every line in csv2 has a corresponding line in csv1 and there are no duplicate lines in the same file. basically i'm trying to remove csv2 from csv1 and output the rest of csv1.