4

I was wondering if it is possible to search among content of documents

  • possibly of various types: pdf, djvu, html, text file, programming code script, ...
  • possibly under various directories under each the documents are mixed together and possibly with other nondocument files?

Is grep capable of doing these kinds of thing?

Thanks and regards!

Tim
  • 26,107

3 Answers3

3

I use Recoll. It's in the repositories. It also searches pdf-metadata. You can choose, which folders are indexed. It's very fast.

Install:

sudo apt-get install recoll
bdr529
  • 3,098
2

Yes. Have a look at FindingFiles in the community documentation for Ubuntu. The one I used for a while was Tracker which is able to index most document types and, due to keeping an index updated in the background, was amazingly fast when searching.

DrSAR
  • 2,132
1

You could run a command on the files returned by the find command.

For example with the following command I list all the files starting from the current directory and on each I run the grep command to search for the string 'getUri'

find . -name '*.*' -exec grep --color 'getURI' {} +

This works perfectly on my Ubuntu 12.04

Anyway, I do not think that grep command is able to search within binary documents such as PDF.
Also using the command above on a large directory tree could be very burdensome in terms of computational time.

Another aspect to consider is that you can not search for a string in a raster PDF, in suck kind of scenario a document management system as LogicalDoc could help.

user175667
  • 121
  • 3