0

We're deploying an Active Directory domain controller slash file server in a small NGO, using Ubuntu Server/Samba and a conventional HDD, due to budgetary concerns.

I'm concerned about the dismal random 4K access rate of conventional spinning drives when it comes to downloading and uploading Windows roaming profiles. So, I'm thinking about getting a very small SSD for caching purposes. Network bandwidth is 1G, so anything beyond ~100MBps is wasted, therefore the larger files (which are accessed more sequentially) should be excluded from the SSD cache.

Can we configure dm-cache to cache only files below certain file sizes on the caching SSD?

1 Answers1

0

So, to answer my own question: I found the required information in the documentation of dm-cache, more precisely in cache-policies.txt.

According to the documentation, dm-cache differentiates between random and sequential transfers right out of the box. To cite cache-policies.txt:

Message and constructor argument pairs are:

  • 'sequential_threshold <#nr_sequential_ios>'
  • 'random_threshold <#nr_random_ios>'
  • 'read_promote_adjustment '
  • 'write_promote_adjustment '
  • 'discard_promote_adjustment '

The sequential threshold indicates the number of contiguous I/Os required before a stream is treated as sequential. The random threshold is the number of intervening non-contiguous I/Os that must be seen before the stream is treated as random again.

The sequential and random thresholds default to 512 and 4 respectively.

Large, sequential ios are probably better left on the origin device since spindles tend to have good bandwidth. The io_tracker counts contiguous I/Os to try to spot when the io is in one of these sequential modes.

This is different from file-level differentiation (which the original question asked for), since presumably randomly read blocks from fragmented large files will also be cached. But this is a similarly feasible (and indeed, much more robust) solution to the problem. So, the original question wrongly assumed that the caching solution works with files; in fact it works on the level of file system blocks.