8

Why does this line do nothing? I'm trying to run a "sed" command in parallel and it outputs nothing to "standard.txt"

$Filetemp = the file I'm stream editing

standard.txt = the file I'm outputting to

cat $Filetemp | parallel --pipe sed -e "s/[[:space:]]\+/ /g" > standard.txt

This is the original code that works just fine but takes way too long:

sed -e "s/[[:space:]]\+/ /g" $Filetmp > standard.txt

GNU Parallel Version: 20130922

Lubuntu 14.04

Roboman1723
  • 2,975
  • 8
  • 26
  • 32

2 Answers2

6

When I run that on a random file, I got a syntax issue. You need to quote the command:

cat $Filetemp | parallel --pipe 'sed -e "s/[[:space:]]\+/ /g"' > standard.txt

If that still doesn't work for you, test echo $Filetemp (or pick a file manually) or remove the redirection so you can see the output as it happens (just in case there's some sort of weird overwriting issue).

Oli
  • 299,380
0

Since version 20140422 GNU Parallel has had --pipepart which is highly efficient:

parallel -a $Filetemp --pipepart 'sed -e "s/[[:space:]]\+/ /g"' > standard.txt

And from version 20161222 you can use --block -1 which will chop $Filetemp into one block per jobslot:

parallel -a $Filetemp --block -1 --pipepart 'sed -e "s/[[:space:]]\+/ /g"' > standard.txt

This can deliver more than 1 GB/s for each core, which means your are likely limited by I/O. You can see if this the case by looking at:

iostats -dkx 1

If the utilization is 100%, then the disk is the bottleneck.

Ole Tange
  • 1,742