I have 2 files that can not be sorted. Both of them have a list of words per lines. I am trying to compare both files and create a new one without any duplicate lines that get matched between both files. This means, if a line on file A is found on file B, it should not show as an output result.
There is a huge issue with many questions and sites that say in their titles "Deleting Duplicates" when in fact it is "Merging Duplicates & Showing A Unique One". These 2 points are very different. One is not actually deleting duplicate lines, only merging them.
For this particular case I do need to DELETE THEM for real. So if they are found in both files, they do not show as a result.
I have tested comm already and this fails. I have also tested several other cases like awk, grep that I have seen. The rules for both files is the following:
- They have different size (Do not have the same amount of lines)
- To be a duplicate it compares the whole line against each and all other lines in the other file
- Files can not be sorted
Here is some information about the files, they carry list of emails, one email per line. Of course because they are not the same size, it does not mean they will have all emails the same, but they do have inside of each other all unique emails. It is just that some emails might be on both files. For the cases where the emails are on both files, the output results should not show those emails.