How to find the same lines and merge the values?

Question

I have the following tabular separated table:

NM_000057   0
NM_000059   0
NM_000060   0
NM_000061   0
NM_000062   0
NM_000063   0
NM_000063   0
NM_000063   3
NM_000063   2
NM_000063   0
NM_000063   0
NM_000063   0
NM_000064   0
NM_000065   0
NM_000066   0
NM_000067   0
NM_000068   0
NM_000069   0
NM_000070   0

I want to look for the first value, if there are more than one equal, I want to merge it and add the values from the second column. In the example:

NM_000057   0
NM_000059   0
NM_000060   0
NM_000061   0
NM_000062   0
**NM_000063 5**
NM_000064   0
NM_000065   0
NM_000066   0
NM_000067   0
NM_000068   0
NM_000069   0
NM_000070   0

Thank you!

αғsнιη · Answer 1 · 2016-09-26T10:20:33.107

2

Use 'awk',

awk '{seen[$1]+=$2} END{for (x in seen) print x, seen[x]}' infile > outfile

In above awk command, main this 'seen[$1]+=$2' part do the job, the variable $1 as the key feild suming the value of second column when matched key seen.

And at the end, we are looping over seen array with x as variable and print the keys seen in first column then the sum result of each key by seen[x].

edited Sep 26 '16 at 10:20

answered Sep 26 '16 at 10:05

αғsнιη

36,350

steeldriver · Answer 2 · 2016-09-26T11:38:38.003

1

Having recently discovered GNU Datamash, I'm going to throw in

datamash groupby 1 sum 2 < input

If your data is not already sorted you may need to add the -s option, and if it is separated by other whitespace (instead of tabs), add -W

edited Sep 26 '16 at 11:38

answered Sep 26 '16 at 10:18

steeldriver

142,475

How to find the same lines and merge the values?

2 Answers2