3

I'm working on a project to find all the .tar installation files on my system using the command:

time find / -type f \( -name "*.tar" -o -name "*.tar.*" \) 2>/dev/null | wc

The first time it runs I get:

real    1m10.767s

The second time it runs I get:

real    0m9.847s

I would like to always get the second time performance of < 10 seconds and forgo the initial performance of 1 minute 10 seconds. What is the best way of avoiding the one minute penalty the first time find is used?


Notes

  • Your initial find may be faster because I have one Ubuntu 16.04 installation plus two Windows 10 installations for a total of 2 million files.
  • OTOH your initial find may be slower as I have Ubuntu 16.04 and one of the Windows 10 installations on a Samsung Pro 960 NVMe SSD rated at 3,000 MBps whereas hard drives are rated at 140 MBps and good SSDs are rated at 400 MBps.
  • If you want to replicate tests but have no .tar files on your system, replace tar with bashrc in the section: -name "*.tar" -o -name "*.tar.*".

TL;DR

Drop RAM caches that speed up find disk access

You can repeat first/second performance tests by calling this little script before the first find:

#!/bin/bash
if [[ $(id -u) -ne 0 ]] ; then echo "Please run as root" ; exit 1 ; fi
sync; echo 1 > /proc/sys/vm/drop_caches
sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches

GIF showing how much RAM disk caching consumes

The find command run across / will consume about 500 MB of cache buffers as the .gif below shows when they are dropped:

drop_caches.gif

^^^--- Notice the memory line immediately below the terminal window shows a drop from 4.74 GiB to 4.24 GiB. It actually drops to 4.11 GiB after the peek screen recorder saves the file and closes. On my system find disk caching is using about 5% of RAM.

1 Answers1

2

Challenging project

In the following sections are things that should work but don't work. In the end the only "sure-fire" way of making this work was with this bash script:

#!/bin/bash
# NAME: find-cache
# DESC: cache find command search files to RAM
# NOTE: Written for: https://askubuntu.com/questions/1027186/improve-initial-use-of-find-performance-time?noredirect=1#comment1669639_1027186

for i in {1..10}; do echo "========================" >> /tmp/find-cache.log printf "find-cache.log # $i: " >> /tmp/find-cache.log date >> /tmp/find-cache.log echo "Free RAM at start:" >> /tmp/find-cache.log free -h | head -n2 >> /tmp/find-cache.log printf "Count of all files: " >> /tmp/find-cache.log SECONDS=0 # Environment variable time find /* 2>/dev/null|wc -l >> /tmp/find-cache.log duration=$SECONDS # Set elapsed seconds echo "$(($duration / 60)) minutes and $(($duration % 60)) seconds for find."
>> /tmp/find-cache.log echo "Free RAM after find:" >> /tmp/find-cache.log free -h | head -n2 >> /tmp/find-cache.log echo "Sleeping 15 seconds..." >> /tmp/find-cache.log sleep 15 done

Copy above text to a script file named: find-cache. Put the script name in Startup Applications. Use the instructions in the next section but substitute the command name /usr/bin/find... with /<path-to-script>/find-cache.

Don't forget to mark the script as executable using:

chmod a+x /<path-to-script>/find-cache

<path-to-script> should be in your $PATH environment such as /usr/local/bin or preferably /home/<your-user-name>/bin. To double check use echo $PATH to reveal the environment variable.

Every time I login I usually startup conky and firefox. You probably do other things. To fine-tune settings for your system check the log file:

$ cat /tmp/find-cache.log
========================
find-cache.log # 1: Sun Apr 22 09:48:40 MDT 2018
Free RAM at start:
              total        used        free      shared  buff/cache   available
Mem:           7.4G        431M        5.9G        628M        1.1G        6.1G
Count of all files: 1906881
0 minutes and 59 seconds for find.
Free RAM after find:
              total        used        free      shared  buff/cache   available
Mem:           7.4G        1.1G        3.0G        599M        3.3G        5.3G
Sleeping 15 seconds...
========================
find-cache.log # 2: Sun Apr 22 09:49:54 MDT 2018
Free RAM at start:
              total        used        free      shared  buff/cache   available
Mem:           7.4G        1.2G        2.9G        599M        3.3G        5.3G
Count of all files: 1903097
0 minutes and 9 seconds for find.
Free RAM after find:
              total        used        free      shared  buff/cache   available
Mem:           7.4G        1.1G        3.0G        599M        3.3G        5.3G
Sleeping 15 seconds...
(... SNIP ...)

Note: between 1st and 2nd iteration free RAM drops 3 GB but firefox is restoring 12 tabs at the same time.

What's going on? For whatever reason when find is run just once in a startup bash job, or a cron reboot bash job, the Linux Kernel thinks: "They probably don't want to keep the page cache so I'll empty it to save RAM". However when the find command is run 10 times as in this script the Linux Kernel thinks: "Whoaa they really like this stuff in the page cache, I better not clear it out".

At least that is my best guess. Regardless of the reason, this approach works as tested many times.


What should work but doesn't work

Below are two attempts at making this project work. I've left them here so others don't waste time repeating them. If you think you can fix them by all means refine them, post an answer and I'll gleefully up-vote.

Use Startup Applications

Tap and release the Windows / Super key (it has the icon: Winkey1 or Winkey2 or Winkey3) to bring up dash.

In the search field type startup and you'll see the Startup Applications icon appear. Click the icon. When the window opens click Add on the right. Fill in the new Startup Program fields as follows:

  • Fill in the name as Cache Find to RAM.
  • Fill in the command as sleep 30 && find /* 2>/dev/null | wc.
  • Add a comment such as "Initial run of Find command to cache disk to ram".
  • Click the Add button on the bottom.

Now reboot and check performance of find command.

Credits: Windows Key icons copied from Super User post.


Cron at reboot

You can use cron to call the find command at boot time to cache the slow disk to fast RAM. Run the command crontab -e and add the following line at the bottom:

@reboot /usr/sleep 30 && /usr/bin/find /* 2>/dev/null | wc -l
  • @reboot tells cron to run this command at every boot / reboot.
  • /usr/sleep 30 has the find command wait 30 seconds before running so the boot runs as fast as possible. Increase this to 45 or 60 depending on your boot speed, time to login and your startup applications to run.
  • /usr/bin/find /* 2>/dev/null | wc-l calls the find command searching all files (/*). Any error messages are hidden by 2>/dev/null. The number of files are counted using | wc -l. On my system it is about 2 million due to one Ubuntu installation and two Windows 10 installations.
  • After adding the line use Ctrl+O followed by Enter to save the file.
  • After saving the file use Ctrl+X to exit the nano editor used by cron. If you chose a different editor than nano use the appropriate commands to save and exit.

As always the acronym YMMV (Your Mileage May Vary) applies.

After reboot I did these tests to prove it does not work:

rick@alien:~$ time find / -type f \( -name "*.tar" -o -name "*.tar.*" \) 2>/dev/null | wc
     26      26    1278

real 1m10.022s user 0m7.246s sys 0m12.840s ─────────────────────────────────────────────────────────────────────────────────────────── rick@alien:~$ time find / -type f ( -name ".tar" -o -name ".tar.*" ) 2>/dev/null | wc 26 26 1278

real 0m8.954s user 0m2.476s sys 0m3.709s