0

I went through a research paper ("Voxel-Based 3D Object Reconstruction from Single 2D Image Using Variational Autoencoders") and tried to implement the approach following this diagram:

![link to image of reference network- https://ibb.co/4JgbQ9s

Here is my implementation for the same:

image = Input(shape=(None, None, 3))

Encoder

l1 = Conv2D(64, (3,3), strides = (2), padding='same', activation='leaky_relu')(image)
l2 = MaxPooling2D(padding='same')(l1) l3 = Conv2D(32, (5,5), strides = (2), padding='same', activation='leaky_relu')(l2) l4 = MaxPooling2D(padding='same')(l3) l5 = Conv2D(16, (7,7), strides = (2), padding='same', activation='leaky_relu')(l4) l6 = MaxPooling2D(padding='same')(l5) l7 = Conv2D(8, (5, 5), strides = (2), padding = 'same', activation = 'leaky_relu')(l6) l8 = MaxPooling2D(padding='same')(l7) l9 = Conv2D(4, (3, 3), strides = (2), padding = 'same', activation = 'leaky_relu')(l8) l10 = MaxPooling2D(padding='same')(l9) l11 = Conv2D(2, (4, 4), strides = (2), padding = 'same', activation = 'leaky_relu')(l10) l12 = MaxPooling2D(padding='same')(l11) l13 = Conv2D(1, (2, 2), strides = (2), padding = 'same', activation = 'leaky_relu')(l12)

#latent variable z l14 = Reshape((60,512))(l13) l15 = Dense((60512), activation = 'leaky_relu')(l14) l16 = Dense((128444), activation = 'leaky_relu')(l15) l17 = Reshape((60,4,4,4,128))(l16)

#Decoder l18 = UpSampling3D()(l17) l19 = Conv3DTranspose(60, (8, 8, 8), strides = (64), padding='same', activation = 'leaky_relu') (l17) l20 = UpSampling3D()(l19) l21 = Conv3DTranspose(60, (16,16,16), strides =(32), padding='same', activation = 'leaky_relu')(l20) l22 = UpSampling3D()(l21) l23 = Conv3DTranspose(60, (32, 32, 32), strides = (32), padding='same', activation = 'lealy_relu')(l22) l24 = UpSampling3D()(l23) l25 = Conv3DTranspose(60, (64, 64, 64), strides = (24), padding='same', activation = 'leaky_relu')(l24) l26 = UpSampling3D()(l25) l27 = Conv3DTranspose(60, (64, 64, 64), strides = (1), padding='same', activation = 'leaky_relu')(l26)

model3D = Model(image, l27)

This gives me error for l10 saying:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_33/351640059.py in <module>
     24 #Decoder
     25 l18 = UpSampling3D()(l17)
---> 26 l19 = Conv3DTranspose(60, (8, 8, 8), strides = (64), padding='same', activation = 'leaky_relu') (l17)
     27 l20 = UpSampling3D()(l19)
     28 l21 = Conv3DTranspose(60, (16,16,16), strides =(32), padding='same', activation = 'leaky_relu')(l20)

/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in call(self, args, *kwargs) 975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list): 976 return self._functional_construction_call(inputs, args, kwargs, --> 977 input_list) 978 979 # Maintains info about the Layer.call stack.

/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list) 1113 # Check input assumptions set after layer building, e.g. input shape. 1114 outputs = self._keras_tensor_symbolic_call( -> 1115 inputs, input_masks, args, kwargs) 1116 1117 if outputs is None:

/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs) 846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature) 847 else: --> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks) 849 850 def _infer_output_signature(self, inputs, args, kwargs, input_masks):

/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks) 884 # overridden). 885 # TODO(kaftan): do we maybe_build here, or have we already done it? --> 886 self._maybe_build(inputs) 887 inputs = self._maybe_cast_inputs(inputs) 888 outputs = call_fn(inputs, args, *kwargs)

/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in _maybe_build(self, inputs) 2657 # operations. 2658 with tf_utils.maybe_init_scope(self): -> 2659 self.build(input_shapes) # pylint:disable=not-callable 2660 # We must set also ensure that the layer is marked as built, and the build 2661 # shape is stored since user defined build functions may not be calling

/opt/conda/lib/python3.7/site-packages/keras/layers/convolutional.py in build(self, input_shape) 1546 if len(input_shape) != 5: 1547 raise ValueError('Inputs should have rank 5, received input shape:', -> 1548 str(input_shape)) 1549 channel_axis = self._get_channel_axis() 1550 if input_shape.dims[channel_axis].value is None:

ValueError: ('Inputs should have rank 5, received input shape:', '(None, 60, 4, 4, 4, 128)')"```

Any help and guidance is appreciated.

arizona_3
  • 1
  • 1

1 Answers1

0

You are missing a Reshape step between layers 9 and 10. In addition I suggest you to add an activation function to your dense layers.

Edit 1:

Actually in Keras we don't usually care about the tensor's first dimension, since it is the batch size: docs. I assume you set the batch_size at your input layer, so the tensors' dimensions first value isn't None?

Check the dimensions of your l7, according to the attached image it should be (60, 512). l8 should be Dense(512, activation = 'leaky_relu')(l7) and l9 would be Dense(128*4*4*4, activation = 'leaky_relu')(l8) which can be reshaped to (60, 4, 4, 4, 128) by calling Reshape((4, 4, 4, 128))(l9).

Now that I look at your implementation more carefully I noticed several issues:

  • It has six Conv2D layers but the reference architecture has seven
  • The number of filters and the kernel size doesn't match, except for the l1 layer. For example l2 should have 64 filters and a size of 5. And the number of filters increases as we go deeper, the last layer has 512 of them but yours has only one.
  • The reference architecture seems to use pooling layers, their resolution is halved at each step.

I suggest you try some simpler tutorials first, to get familiar with tensor sizes and how they relate to different network layers and their parameters.

NikoNyrh
  • 842
  • 5
  • 9