Remove certain fields from a line

Question

I have the following lines in a file:

Modified folders: html/project1/old/dev/vendor/symfony/yaml/Tests/bla.yml
Modified folders: html/port5/.DS_Store
Modified folders: html/trap/dev8/.DS_Store
Modified folders: html/bla3/test/appl/.DS_Store
Modified folders: html/bla4/pro1/app/bla/Api2.php
Modified folders: html/bla10/dev/appl/language/.DS_Store
Modified folders: html/bla11/dev/appl/language/abc.txt

This is basically output of rsync. I would like to list all the lines of the file up to 3 directory places, like

Modified folders: html/project1/old
Modified folders: html/port5
Modified folders: html/trap/dev8
Modified folders: html/bla3/test
Modified folders: html/bla4/pro1
Modified folders: html/bla10/dev
Modified folders: html/bla11/dev

Can anyone please provide me any command or shell script to do the same?

Zanna · Accepted Answer · 2018-01-22T20:36:34.220

Maybe like this:

$ sed -r 's|/[^/]*$||' file | sed -r 's|([^/]*/?[^/]*/?[^/]*).*|\1|'
Modified folders: html/project1/old
Modified folders: html/port5
Modified folders: html/trap/dev8
Modified folders: html/bla3/test
Modified folders: html/bla4/pro1
Modified folders: html/bla10/dev
Modified folders: html/bla11/dev

Or you can do the second part with cut:

sed -r 's|/[^/]*$||' file | cut -d '/' -f 1,2,3

Notes

-r use ERE
s|old|new| replace old with new
[^/]* any number of characters that are not /
$ end of line
/? zero or one /
(pattern) save pattern to reference later with \1
.* any number of any characters
| (unquoted) shell pipe - passes output of left hand side command to right hand side command
cut -d '/' use / as delimiter
-f 1,2,3 print the first three fields

score 3 · Answer 2 · edited Jun 12 '20 at 14:37

The following script will (almost) do as you ask.

#!/usr/bin/env perl
use strict;
use warnings;
while(<DATA>) {
    s!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!;
    print;
}
DATA
Modified folders: html/project1/old/dev/vendor/symfony/yaml/Tests/bla.yml
Modified folders: html/port5/.DS_Store
Modified folders: html/trap/dev8/.DS_Store
Modified folders: html/bla3/test/appl/.DS_Store
Modified folders: html/bla4/pro1/app/bla/Api2.php
Modified folders: html/bla10/dev/appl/language/.DS_Store
Modified folders: html/bla11/dev/appl/language/abc.txt

It reads every input line, picks some values from it (my means of a regex), replaces the line with the picked values, and finally prints the now modified line (to STDOUT).

Output

Modified folders: html/project1/old/
Modified folders: html/port5/
Modified folders: html/trap/dev8/
Modified folders: html/bla3/test/
Modified folders: html/bla4/pro1/
Modified folders: html/bla10/dev/
Modified folders: html/bla11/dev/

If we write the regex in one single line:

s!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!;

then it looks a bit scary but it is actually quite simple. The basic operator is the substitution operator s/// from Perl.

s/foo/bar/;

will replace every occurence of foo with bar. s allows us to change the delimiter from / to something different. I used a ! here, so we could also write

s!foo!bar!;

The ! does not mean not it's just an arbitrary character here. sLfooLbarL; would work as well. We do that because if we use the standard / we would need to escape the / within the parameters (which is then known as toothpick syntax). Consider we want to replace the path /old/path with /new/path. Now compare:

s/\/old\/path/\/new\/path/; # escaping of / needed
s!/old/path!/new/path!;     # no escaping of / needed (but of ! if we had one in the text)

We can also apply the x modifier to the s///. It allows for arbitrary whitespace (even newlines and comments) in the pattern (the left hand side) to improve readablity. Now the loop can be written as:

while(<DATA>) {
    s!^                         # match beginning of line
      (Modified\s+folders:\s+)  # the word "Modified", followed by 1 ore more 
                                # whitespace \s+,
                                # the literal "folders:", also followed by 1 or 
                                # more whitespace.
                                # We capture that match in $1 (that's why we have 
                                # parens around it).
      (                         # begin of 2nd capture group (in $2)
        (?:                     #   begin a group that is NOT captured (because of the "?:"
         [^/]+/                 #   one or more characters that are not a slash followed by a slash
        )                       #   end of group
        {1,3}                   #   this group should appear one to three times
      )                         # close capture group $2, i.e. remember the 1-3x slash thing
      .*?$                      # followed by arbitrary characters up to the end of line
     !$1$2!x;                   # Replace the line with the two found captures $1 and $2, i.e.
                                # with the text "Modified folders:" and the 1-3x slash thing.
    print;
}

The complete "script" can also be written as a one-liner:

perl -pe 's!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!x;' file

Update

I just realized that the Modified folders: string can be seen as a component of the path as well. So the pattern can be simplified to

perl -pe 's!^((?:[^/]+/){1,3}).*?$!$1!;' file

score 3 · Answer 3 · 2018-01-23T08:20:18.460

3

grep -oP '^.*?(/.*?){0,2}(?=/)'

a brief explanation of the dark regexp used:

^... i the beginning of the line
.*? a seq. of chars (but just the necessary amount) to match the pre-path
/.*?){0,2} 0, 1 or 2 directories
(?=/) look ahead expression -- followed by a / that is not included

edited Jan 23 '18 at 08:20

answered Jan 22 '18 at 14:57

Remove certain fields from a line

3 Answers3

Notes

Update