3

I have many files. File format is year(4-digit)month(2-digit)day(2-digit)

Sample filenames:

  • 20150101.txt
  • 20150102.txt

Content of sample filenames

00:00:13 -> 001528

I want to extract data as date from filename and then to insert it in the file

Desired output

2015-01-01T00:00:13 001528

or

2015-01-01 00:00:13 001528

I tried one of below code

for files in *txt; do
awk -F "->" 'BEGIN{OFS=""} {print FILENAME" ",$1, $2}' <$files > $files.edited
mv $files.edited $files
done

Please guide.

chess_freak
  • 121
  • 1
  • 8

2 Answers2

4

If you have GNU awk (gawk) then you could use its built-in Time Functions to convert pieces of the file name and contents into an epoch time, and then convert it according to a chosen format.

Ex. given

$ cat 20150101.txt 
00:00:13 -> 001528

Then

$ awk -F ' -> ' '
    split($1,a,/:/) {
      ds = sprintf("%04d %02d %02d %02d %02d %02d", substr(FILENAME,1,4), substr(FILENAME,5,2), substr(FILENAME,7,2), a[1], a[2], a[3]); 
      $1 = strftime("%FT%T", mktime(ds))
    } 
    1
  ' 20150101.txt 
2015-01-01T00:00:13 001528
steeldriver
  • 142,475
2

This will give you the desired output using sed:

for files in *.txt; do
sed -e "s/^./$files&/;s/./&-/4;s/./&-/7;s/.txt/T/;s/ -> / /" "$files"
done

To actually insert each output into each file, you do not need to redirect as you did in your loop. You can simply use the -i option instead of -e.

  • the s (substitute) command uses the following syntax: s/regexp/replacement/flags
  • . matches any character and ^. matches the first character of a line
  • & back-references the whole matched portion of the pattern space
  • s/^./$files&/ says to substitute the first character with the filename before the first character
  • s/./&-/4 uses the number flag 4 to substitute the 4th character (the 4th match of .) with - after the 4th character
  • s/./&-/7 replace the 7th character with - after the 7th character (note that the 6th character becomes the 7th character after inserting - after the 4th character).

And of course,

  • s/.txt/T/ substitutes .txt with T and
  • s/ -> / / substitutes -> with a single blank space.

This is the output:

2015-01-01T00:00:13 001528
2015-01-02T00:00:13 001528
mchid
  • 44,904
  • 8
  • 102
  • 162