3

I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows

file1.iso   file2.iso
---------   ---------
0           0
1           1
2           0
3           0
0           4
0           5

file3.iso should be as follows:

0
1
2
3
4
5

The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04

Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l as intermediate data but want to avoid writing code.

Jus12
  • 183
  • 1
  • 10

1 Answers1

2

I wrote a Python script for you.

#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''

import sys

args = sys.argv[1:]

file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')

def get_bytes(file):
    '''Return a generator that yields each byte in the given file.'''
    def get_byte():
        return file.read(1)
    return iter(get_byte, b'')

for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
    if byte1 == byte2:
        byte_out = byte1
    elif ord(byte1) == 0:
        byte_out = byte2
    elif ord(byte2) == 0:
        byte_out = byte1
    else:
        msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
        raise ValueError(msg.format(i, byte1, byte2))
    file_out.write(byte_out)

Make it executable then call it like so:

$ ./test.py file1.iso file2.iso file3.iso

Or for short:

$ ./test.py file{1,2,3}.iso

P.s. I've recently been studying reading files in different ways, so this is nice serendipity.

wjandrea
  • 14,504