0

When a single image is assigned for training, an auto-encoder should be able to gradient-descend and find the full set of satisfactory weights that will reconstruct this image.

enter image description here

Suppose a second image is now assigned to the autoencoder:

enter image description here

Can the previously trained autoencoder weights (for image 1) now be reused immediately (without further training) to reconstruct the second image as well?

James
  • 167
  • 4
  • 1
    If you do not pose constraints on the dimensionality of the encoded representation, you can cheat by imposing $d= |x|$ and learn an identity mapping of the input quite easily. Of course, this is a perfectly useless network since your encoded representation has the same dimensionality of the input and replicates the input exactly :) – Ciodar Mar 22 '24 at 08:56
  • @Ciodar thank you, yes I believe the identity mapping is what I was really after... Say suppose this auto-encoder is implemented as a fully-convolutional U-net type network (all weights are kernel values or bias), skipped connections, and relu slapped onto each stage's output, then is it certain that a specific set of weights that perform the identity mapping (that maps all images into itself pixel-to-pixel) can still be found after all those convolution steps and relu information truncation? – James Mar 22 '24 at 09:18
  • 1
    Look at a single residual block: if you feed $x$, all you need to do is make the network zero out everything (i.e all weights to zero), to obtain $x$ as output. Maybe this question is answering exactly to that. – Ciodar Mar 22 '24 at 17:36

1 Answers1

1

The latent space representation learned by the VAE may not be meaningful or informative with only one input image. Reconstruction of the input image may not be accurate or meaningful since the VAE may simply learn to memorize the input image rather than learning useful latent features for reconstruction.

Even assuming it can be trained successfully with only one image as yours, it's quite impossible to achieve your goal since these 2 images have obviously very different appearances and features. For example, your trained image has a big hat covering a large portion of space which your target output image lacks. There're many other dissimilarities. By randomly sampling the learned latent space with only one image, you can only expect similarly generated image at best.

cinch
  • 11,000
  • 3
  • 8
  • 17