0

I am trying to classify whether or not a specific object is in panoramic photos. The issue is, a panoramic photo can be any width, so the input to my neural network can't be fixed in that dimension.

I've been using RNNs (QRNNs to be specific, as I am trying to reduce the number of parameters as much as possible), but they always learn where the object usually is in the image, and then have a really hard time classifying an image with the object in a different place.

I'm looking for something similar to CNNs, where it doesn't have a spatial dependance (or in this case, a temporal dependance?), but it can't have a fixed input width.

Any ideas?

desertnaut
  • 1,021
  • 11
  • 19

1 Answers1

2

Listen, this is not an answer to your question, but it seems that you are missing the whole point of convolution.

Simplified explanation: Convolution is just a weighted sum of the neighbors of a pixel

You see how this is not dependent on the size of the image?
Take a 3x3 filter, on a NxM image, apply convolution, and you will get a (N-2)x(M-2) image as output

Now take a TxS image, and apply the same filter over it, what you get (T-2)x(S-2)

you see now that you can apply a convolutional layer to any arbitrary sized image?

You still don't believe me? take this code, and you will see that you can input two images of different sizes to this neural network and it won't complain:

network = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=3, activation=tf.nn.leaky_relu, padding="SAME"),
    tf.keras.layers.Conv2D(32, kernel_size=3, activation=tf.nn.leaky_relu, padding="SAME"),
    tf.keras.layers.Conv2D(32, kernel_size=3, activation=tf.nn.leaky_relu, padding="SAME"),
    tf.keras.layers.Conv2D(32, kernel_size=3, activation=tf.nn.leaky_relu, padding="SAME"),
    tf.keras.layers.GlobalMaxPooling2D(),
    tf.keras.layers.Dense(NUM_CLASSES, activation="softmax"),
])

you're welcome :-)

Alberto
  • 2,863
  • 5
  • 12