How to deal with images of different sizes, which need to be passed to a model of fixed input size, without losing details and spatial information?

Question

I have the following problem while using convolutional neural networks to detect forgeries:

Resizing the image to fit the required input size may not be a good way because the forgery detection largely relies on the details of images, for example, the noise. Thus the resizing process may change/hurt the details.

Existing methods mainly use image patches (obtained from cropping) that have the same size. This way, however, will drop the spatial information.

I'm looking for some suggestions on how to deal with this problem (input size inconsistency) without leaving out the spatial information.

score 0 · Answer 1 · answered Jul 02 '21 at 22:06

I don't think that the input size inconsistency won't leave out spatial information in the Convolutional Neural Network. The image resizing would loose the characteristics of the object on the image.

It looks like that you don't want to crop your input image, which looks like being fabricated. I like to suggest these preprocessing before the Convolutional Neural Network:

  (1) Find an original image or a picture of the real object
  (2) Perform image registration between a suspicious image and the original image (the registration result should be fine)
  (3) Calculate color difference in each pixel position
  (4) Generate new image with these differences
  (5) Feed to your Convolutional Neural Network for the anomaly detection

score 0 · Answer 2 · answered Sep 06 '24 at 12:47

When using convolutional neural networks for forgery detection, resizing images can indeed be problematic because crucial details, such as noise patterns, may get altered, which can impact the model’s performance. At the same time, cropping images into patches can result in losing important spatial information.

One solution to this problem is to use letterboxing, a technique commonly used in object detection models like YOLO and SSD. With letterboxing, the image is resized while maintaining its aspect ratio by adding padding. This way, the image’s original details, including the noise, are preserved, and you avoid distortions or loss of spatial data. like some classification models like yolov8 also use letterboxing in their training and inferece of the yolov8 classification model

However, you’ll need to ensure that the model doesn't falsely learn from the padded areas, possibly by using techniques like masking to ignore the padded regions during training.

This approach should help maintain both the detailed features and spatial consistency of the images while feeding them into the neural network at a fixed input size.

How to deal with images of different sizes, which need to be passed to a model of fixed input size, without losing details and spatial information?

2 Answers2