40

I need to create a list of checksums of the files that are inside a directory, including any subdirectories.

The command that I try to execute is the following:

 sha256sum -b * 

Usage:


 -b = Read in Binary.

 * = Specifies that you must verify all file extensions.

With the command I get the following output:

sha256sum: test0: Is a directory e3d748fdf10adca15c96d77a38aa0447fa87af9c297cb0b75e314cc313367daf *test1.txt db0c7a354881fe2dd1b45642a68f6a971c7421e8fdffe56ffa7c740111e07274 *test2.txt

Instead of reporting that test0 is a directory, you should also generate the checksum of the content.

Do you recommend always using -b in any type of file? In what cases should -t be used?

Is it possible to filter the types of files I want to omit in the verification, without having to add all the files I want to admit? What command should I execute?

I looked for help but I do not find anything related.

MarianoM
  • 705

7 Answers7

47

You can use find to find all files in the directory tree, and let it run sha256sum. The following command line will create checksums for the files in the current directory and its subdirectories.

find . -type f -exec sha256sum {} \;

I don't use the options -b and -t, but if you wish, you can use -b for all files. The only difference that I notice is the asterisk in front of each file name.

sudodus
  • 47,684
19

TL;DR

cd /path/to/working/directory
sha256sum <(find . -type f -exec sha256sum {} \; | sort)

Intro

A more complete answer to the one above, which fixes the problem with find "finding" files in different orders on different systems.

Piping output to file, compare with diff

Firstly, you probably want to pipe the output to a file for comparison with diff. For this you would use

find . -type f -exec sha256sum {} \; > file1.lst

Then on your other system

find . -type f -exec sha256sum {} \; > file2.lst
rsync file2.lst user@host:/home/user/file2.lst
ssh user@host
diff file1.lst file2.lst # might not match due to order

Fixing order of files found with find by piping to sort

Here I am assuming you are doing something similar to what I required this for - copying files from one system to another over a network and verifying the integrity of those files.

What I found was that the order in which find finds files can vary between two systems, even when the OS is "Debian" in both cases.

Therefore, one needs to sort the output in the text files.

sort file1.lst > file1sorted.lst
sort file2.lst > file2sorted.lst
diff file1.lst file2.lst # bad
diff file1sorted.lst file2sorted.lst # ok

You can do the find and sort all in one line, while redirecting the output to a file.

find . -type f -exec sha256sum {} \; | sort > file1.lst

Other sha/md5 sums

You might want to have an increased level of shasumming. To use the 512 bit version simply do;

find . -type f -exec sha512sum {} \; | sort > file1.lst

Alternatively, 256 bit might be overkill for what you are doing, so do

find . -type f -exec md5sum {} \; | sort > file1.lst

A complete 1 line command to compare 2 directories with 1 shasum output

Now, if you have many files and do not want to save the output to a file, you could simply shasum the output. To do this, use

sha256sum <(find . -type -f -exec sha256sum {} \; | sort)

The pipe to sort is required to ensure the output is sorted before computing the final sha256sum. Without this, if find finds files in a different order, despite the shasums for each file being correct, the overall shasum will depend on the order.

Problem relating to diff output and paths used

You may have some path which looks like

/A/B/C/*

where * are the subdirectories and files you are interested in shasumming. If A/B/C are 1 or more directories containing only 1 subfolder you might end up accidentally running your shasum command in the wrong directory, resulting in the following

sort1.txt
sha256sum1    ./A/B/C/file1

sort2.txt sha256sum2 ./B/C/file1

Even if sha256sum = sha256sum2 diff will say the files are different. (Because they are due to the different base directory in the path.)

Here is a short python3 code to check the sums line by line, which solves this problem.

#!/usr/bin/env python3
file1_name = "sort1.txt"
file2_name = "sort2.txt"
file1 = open(file1_name, 'r')
file2 = open(file2_name, 'r')
file1_lines = file1.readlines();
file2_lines = file2.readlines();
if(len(file1_lines) == len(file2_lines)):
    print("line numbers ok")
    for i in range(len(file1_lines)):
        line1 = file1_lines[i]
        line2 = file2_lines[i]
        line1_split = line1.split(' ')
        line2_split = line2.split(' ')
        shasum1 = line1_split[0]
        shasum2 = line2_split[0]
        if(shasum1 != shasum2):
            print("shasum error: ", line1)
else:
    print("Error: file ", file1_name, " number of lines != ", file2_name, " number of lines")
print("done")

I initially wanted to write a shell script to do this, but I got bored trying to figure out how to do it, so went back to python.

This makes me think that actually writing a python code to do the entire thing would have been easier, except for the find command.

user3728501
  • 1,142
9

Late answer, but for the sake of documentation...

The other answers suggest to call sha256sum via find and the -exec option. This has the effect that sha256sum is called once for each file, which is a significant overhead for the OS.

A more efficient solution is to convert the find results to command line arguments by piping it through xargs and call sha256sum that way. xargs runs sha256sum once or in large batches if there are too many lines.

find /path/to/your/dir -type f | xargs sha256sum -b

In case that you have filenames with whitespaces, use the -print0 flag in find and -0 flag in xargs to terminate strings with \0

find /path/to/your/dir -type f -print0 | xargs -0 sha256sum -b
4
find . -type f -print0 | sort -z | xargs -0 sha256sum  > /tmp/checksums.txt

Will generate a list of checksums for you, independent on 'find' order, and easy to reuse on Windows. All filenames are NULL separated, then 'sort'-ed with NULL separation, and then use xargs with NULL separation to compute the hash. The shasum -b and -t options are not needed on Linux according to the man page.

vkersten
  • 161
3

To include all files in subdirectories use double asterisks:

sha256sum /path/to/your/dir/**

It requires the globbing enabled. If not, try to enable it: shopt -s globstar. See this question for more details.

3

Short answer: sha256deep


It feels very wrong being directed to this FAQ here as one of the most relevant answers now. sha*deep|md5deep have existed for years, moved to the hashdeep package some years ago and have been maintained because... well sha256sum has very limited scope of functionality.


On another note:

I used CFV in the past for such tasks, but it was removed from Ubuntu and was one of the latest projects to find new maintainers willing to port it to Python3. Finding this question here and the many answers, but also realizing that pipx exists, just jumped back to CFV.

# Install pipx
python3 -m pip install --user pipx

Install CFV

pipx install cfv

Hash the current directory recursively and create a file containing the

hashes name like the directory

cfv -Crrt sha256

Is it possible to filter the types of files I want to omit in the verification, without having to add all the files I want to admit? What command should I execute?

That would be where find come in handy to create a list of files you want to hash. You experiment with find and --exclude until the output matches what you need, then you redirect find's output to a file and run cfv -Crrt sha256 -f file_list

LiveWireBT
  • 29,597
0

If you want to sort files in the folder by filesize and then do the sha, you can run

for i in $(ls -Sr .); do sha256sum $i; done;
Scholtz
  • 191
  • 1
  • 9