Can two different binary files have the same md5 sum? One binary firmware file have different version number and marked as revised, small bug fixed. But both files have same md5 sum, I would assume that revised file can't have the same md5 sum - can this be a mistake?
6 Answers
What the existing answers fail to point out is why a collision is deemed to be vanishingly unlikely in this case.
MD5, like any hashing algorithm, was deliberately designed so that a collision won't happen if you just change a handful of characters. You have to change most if not all of them in order to cycle back around to the same hash. That's because the whole point of a hash is to detect single-bit (or few-bit) errors; in this problem domain, you want the smaller changes to definitely trigger a hash change. Flawed though we now know MD5 to be, that property holds to this day.
So, unless the new version of firmware is completely different, and unless you just witnessed a once-in-a-million phenomenon, the odds are huge that you simply received the old version again. Congratulations, because this is the hash-check process working precisely as intended. :)
Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value.
There is one caveat, though: An md5 sum is 128 bits (16 bytes). Since the number of different possible file contents is infinite, and the number of different possible md5 sums is finite, there is a possibility (though small probability in most cases) of collision of hashes. In other words, two different files can produce the same sum when hashed with md5.
Because of this, it's better in some cases to use a higher bit hash (more possible different outputs), to reduce the (already low) probability of an accidental hash collision, and increase the difficulty of creating a deliberate hash collision through brute force.
Examples of higher bit hashes include the SHA-2 family of hashes, especially sha256, sha384, or sha512 (which is the best.) The number after sha indicates the number of bits the corresponding hash algorithm generates.
As others have said, an MD5 collision is hypothetically possible but extremely implausible (1 in 2^128 is only a 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456 chance), and you most likely have a file-copying error.
I'd recommend doing a byte-by-byte comparison of the two files, using one of the many methods described here: https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux.
Or just diff file1 file2 - and unless you get the message "Binary files file1 and file2 differ", the files are the same.
- 139
All of the above answers ignore the most important detail:
An MD5 checksum is defined to have 128 bits. That means, there are only 2^128 different MD5 values. How many different firmware images are possible? Well, that depends on how big they are, and it depends on what percentage of random byte sequences could be considered valid firmware. Chances are though, there are more than 2^128 possible firmware images.
A lot more, which means there must be duplicates.
But, the chance of any given firmware image matching a given MD5 checksum is only 1 in 2^128 which is a very small number.
VERY small.
Like, the chance of any two developers accidentally creating different images that have the same MD5 checksum at any time during the existence of human civilization is too small for you to worry about.
That's accidentally. Deliberately is a different question. If you're working for the NSA, then 128 bits is not going to be enough bits of security to satisfy your bosses, and MD5 has known vulnerabilities that make it weaker than 128 bits.
But if you were working for the NSA, then you probably already knew that.
- 157
Very unlikely but possible. Check the filesize and dates for further information. If the files are different, it would be even more unlikely they would have the same size and hash.
- 824