1

I have grayscaled image, specifically medical data ultrasonography.

In the context of medical domain, there are techniques to capture data that called "View". It's like point of views of CCTVs with different placement when capturing image. So, there are 5 views actually. Therefore, there are 5 image captured at once.

Since the image is grayscale which indicate one channel, can I replace the num of channels to the num of views.

So the input dimension will looks like this:

  • (Height, Width, Channel/View) -> (600, 800, 5)

Instead of this:

  • (Height, Width, Channel/Color) -> (600, 800, 1)

Or, shall I use Conv3D for this? even though my data doesn't contains Z-axis.

  • (Depth/View, Height, Width, Channel/Color) -> (5, 600, 800, 1)

At the end, the output dimension is a vector with 3 element. (three neurons) with softmax activation as the classifier of pathology/disease/diagnose.

1 Answers1

1

You can do whatever you want, but it won't necessarily achieve a good result. CNNs have the inductive bias that features are spatially correlated. If you simply stack different views and immediately pass it through a convolutional layer, your model likely won't get a chance to actually extract information from a particular view (via iteratively building up every individual feature's receptive field) before it is combined with a different view (vanilla CNNs are not good approximating identity functions). Additionally, you should not be training your models from scratch. What would likely be more effective is to use a pre-trained model on your different views to extract feature maps corresponding to each view. Then, use those features in a fully connected layer to predict scores you can then input into your softmax.