Where am I going wrong in my CNN approach to automate cropping images?

Question

I have a dataset compiled of geological images. They often have unnecessary padding to the left, right, and bottom. I also have a folder containing cropped versions of these images where the padding is removed. My hope is to train this CNN model to where it can crop the unwanted parts of a newly given geological image or dataset.

I am using image segmentation and creating binary masks for each cropped image, where the area to keep is 1 and the area to crop out is 0. I train the model to predict this mask instead of coordinates of what was taken out because oftentimes the dimensions don't directly correspond to crop coordinates in the original image. The images are of varying dimensions, generally around 25000x1565, and of course, the cropped versions are smaller.

I am not resizing the images to a specific dimension, rather taking the max dimension and padding images to that size in the preprocessing function. Running this code gives me a shape mismatch error: ValueError: Arguments target and output must have the same shape. Received: target.shape=(1, 1566, 34813, 1), output.shape=(1, 1568, 34816, 1). I have print statements for sizes/shapes in the code and nothing was out of place there.

Where am I going wrong? I am a beginner in this field but I feel like training a model on cropping images shouldn't be too hard. Could this be simpler or what am I missing to make it work?

import tensorflow as tf
from tensorflow.keras import layers, models, Input
import numpy as np
from PIL import Image
import os
import matplotlib.pyplot as plt
def create_model(input_shape):
    inputs = Input(shape=input_shape)
# Downsampling
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)

# Upsampling
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)

# Ensure output has same dimensions as input
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
outputs = layers.Conv2D(1, (1, 1), activation='sigmoid', padding='same')(x)

model = models.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model


def train_model(model, X_train, y_train, epochs=100, batch_size=1):
    history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size)
    return history
def load_and_preprocess_data(original_dir, cropped_dir):
    originals = []
    masks = []
for img_name in os.listdir(original_dir):
    # Process original image
    orig_img = Image.open(os.path.join(original_dir, img_name))
    print(f&quot;Original image size: {orig_img.size}&quot;)
    orig_array = np.array(orig_img)
    print(f&quot;Original array shape: {orig_array.shape}&quot;)

    # Process cropped image
    cropped_img = Image.open(os.path.join(cropped_dir, img_name))
    print(f&quot;Cropped image size: {cropped_img.size}&quot;)
    cropped_array = np.array(cropped_img)
    print(f&quot;Cropped array shape: {cropped_array.shape}&quot;)

    # Create mask based on original image size
    mask = np.zeros((*orig_array.shape[:-1], 1), dtype=np.float32)

    # Resize cropped array to match original array size
    resized_cropped = np.array(Image.fromarray(cropped_array).resize(orig_img.size))

    # Create mask
    mask[np.any(resized_cropped &gt; 0, axis=2)] = 1
    print(f&quot;Mask shape: {mask.shape}&quot;)

    originals.append(orig_array)
    masks.append(mask)

# Find the maximum dimensions
max_height = max(img.shape[0] for img in originals)
max_width = max(img.shape[1] for img in originals)

# Pad images to the maximum size
padded_originals = []
padded_masks = []
for orig, mask in zip(originals, masks):
    pad_height = max_height - orig.shape[0]
    pad_width = max_width - orig.shape[1]
    padded_orig = np.pad(orig, ((0, pad_height), (0, pad_width), (0, 0)), mode='constant')
    padded_mask = np.pad(mask, ((0, pad_height), (0, pad_width), (0, 0)), mode='constant')
    padded_originals.append(padded_orig)
    padded_masks.append(padded_mask)

X = np.array(padded_originals)
y = np.array(padded_masks)

print(f&quot;Final X shape: {X.shape}&quot;)
print(f&quot;Final y shape: {y.shape}&quot;)

return X, y


Main execution
original_dir = "uncropped"
cropped_dir = "cropped"
X, y = load_and_preprocess_data(original_dir, cropped_dir)
Convert to tensors
X = tf.convert_to_tensor(X)
y = tf.convert_to_tensor(y)
print(f"X tensor shape: {X.shape}")
print(f"y tensor shape: {y.shape}")
Create and train the model
input_shape = X.shape[1:]  # (height, width, channels)
model = create_model(input_shape)
history = train_model(model, X, y, epochs=100, batch_size=1)
```

score 0 · Answer 1 · answered Sep 19 '24 at 06:28

The error you're encountering is due to the mismatch between the dimensions of your target (y) and output during training.

You are facing this issue due to the following reasons

1. Padding Issues:

The mismatch in dimensions (target: (1, 1566, 34813, 1) vs output: (1, 1568, 34816, 1)) suggests that the images are not consistently padded to the same size or that some operations during upsampling/downsampling introduce small changes in dimensions due to the stride, pooling, or resizing operations.

2. Max Pooling and UpSampling Size Mismatch:

Max pooling and upsampling operations can sometimes result in an off-by-one error due to rounding issues, especially when input dimensions are odd or not divisible by the pooling size. For example, if an image dimension is not perfectly divisible by 2, max pooling followed by upsampling might lead to a small dimensional mismatch.

To fix this you should do the following steps

1. Ensure Consistent Image Shapes: Double-check the preprocessing to ensure that all input and mask images are consistently padded to the same dimensions before feeding them into the model. Ensure that both original images and masks are of the same size, both after padding and resizing.

=> Print out the exact shapes of each input and mask after padding to verify they match.

2. Use Cropping Layers to Maintain Dimensional Consistency:

Add a Cropping2D layer if your model's output is slightly larger than expected. This can be helpful after upsampling operations.

3. Ensure Correct Padding in Pooling/Upsampling Layers:

If you use pooling layers, ensure that the padding is correctly specified to maintain dimensional consistency.

MaxPooling2D and UpSampling2D with padding='same' should generally prevent mismatches, but rounding issues can still happen.

# After upsampling, if there's a mismatch in dimensions
x = layers.Cropping2D(cropping=((1, 0), (1, 0)))(x)  # Adjust this cropping as needed

3. Ensure Correct Padding in Pooling/Upsampling Layers:

=> If you use pooling layers, ensure that the padding is correctly specified to maintain dimensional consistency.

=> MaxPooling2D and UpSampling2D with padding='same' should generally prevent mismatches, but rounding issues can still happen.

You may also consider calculating the output size more rigorously if you're dealing with odd or large dimensions.

after making all the changes your create model function should look something like this one

def create_model(input_shape):
    inputs = Input(shape=input_shape)
# Downsampling
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)

# Upsampling
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)

# Cropping to avoid mismatch in dimensions
x = layers.Cropping2D(cropping=((1, 0), (1, 0)))(x)  # Adjust as needed

# Ensure output has the same dimensions as input
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
outputs = layers.Conv2D(1, (1, 1), activation='sigmoid', padding='same')(x)

model = models.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model

Where am I going wrong in my CNN approach to automate cropping images?

Main execution

Convert to tensors

Create and train the model

1 Answers1