1

So, I have allot of files (167k) and now they are now in proper order, thanks to Serg's script in here - https://askubuntu.com/a/686829/462277 .

And now I need to find gaps between filenames, the diference should be 15 and more

Aaaa.bb - 000002 tag tag_tag 9tag  
Aaaa.bb - 000125 tag tag_tag 9tag  
Aaaa.bb - 000130 tag tag_tag 9tag  

They all start the same and have different endings.
Everything is in external HDD.

Ceslovas
  • 37
  • 6

2 Answers2

1

a version in python (python3 to be precise).

save the program below under the name diff_filename.py (make it exectuable) and use it in the following way:

$ ./diff_filename.py the/directory/containing/the/files

the program assumes that the numbers you want to compare are always in the same position of the filename (indices 10:16).

as it is now it's pretty verbose and prints out correct filenames including the difference. as soon as it hits a filename that does not respect the minimal difference it prints that out and stops.

here's the source code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

'''
usage: ./diff_filename.py the/directory/containing/the/files
'''

import os
import  sys

MIN_DIFF = 15

the_dir = sys.argv[1]
sorted_files = sorted(os.listdir(the_dir))

last_number = None
last_file = None
for current_file in sorted_files:
    current_number = int(current_file[10:16])
    if last_number is None:
        last_number = current_number
        last_file = current_file
        continue
    diff = current_number - last_number
    if diff < MIN_DIFF:
        print('fail! "{}" and "{}" do not respect MIN_DIFF={}'.format(
            last_file, current_file, MIN_DIFF))
        break
    else:
        print('ok! "{}" and "{}" diff={}'.format(last_file, current_file, diff))

    last_number = current_number
    last_file = current_file
1
find . -maxdepth 1  -type f -regextype posix-awk -iregex ".*[:digit:]"| sort | awk '{  if ( ($3 - previous) > 15 ) print previous"**"$3}{ previous=$3 }'

The code above uses find command which matches all of the files in the current directory that contain digits in it, sorts them, and passes on to awk. awk goes through the list, stores each number from field 3 into variable previous and on the next item compares previous with current number