4

I have the following tabular separated table:

NM_000057   0
NM_000059   0
NM_000060   0
NM_000061   0
NM_000062   0
NM_000063   0
NM_000063   0
NM_000063   3
NM_000063   2
NM_000063   0
NM_000063   0
NM_000063   0
NM_000064   0
NM_000065   0
NM_000066   0
NM_000067   0
NM_000068   0
NM_000069   0
NM_000070   0

I want to look for the first value, if there are more than one equal, I want to merge it and add the values from the second column. In the example:

NM_000057   0
NM_000059   0
NM_000060   0
NM_000061   0
NM_000062   0
**NM_000063 5**
NM_000064   0
NM_000065   0
NM_000066   0
NM_000067   0
NM_000068   0
NM_000069   0
NM_000070   0

Thank you!

αғsнιη
  • 36,350

2 Answers2

2

Use 'awk',

awk '{seen[$1]+=$2} END{for (x in seen) print x, seen[x]}' infile > outfile

In above awk command, main this 'seen[$1]+=$2' part do the job, the variable $1 as the key feild suming the value of second column when matched key seen.

And at the end, we are looping over seen array with x as variable and print the keys seen in first column then the sum result of each key by seen[x].

αғsнιη
  • 36,350
1

Having recently discovered GNU Datamash, I'm going to throw in

datamash groupby 1 sum 2 < input

If your data is not already sorted you may need to add the -s option, and if it is separated by other whitespace (instead of tabs), add -W

steeldriver
  • 142,475