Replies: 3 comments 1 reply
-
Hey @axu-git, Actually, As to your questions. Q1) i) The 1 is indeed for the number of channels that needs to be specified when using convolutions ii) The number of channels represent the encoding of your image. In the MNIST case, the image is gray scale meaning that a pixel can be represented by a single value in [0, 255] representing the pixel intensity. In case of coloured images, you may want to use the RGB encoding meaning that each pixel will be represented by 3 values (one for blue, one for red and one for green). Hence, you will need 3 channels. This has nothing to do with the number of vector/data points you are dealing with iii) No, this depends on the type of data your are using and the network you want to use for your encoder/decoder. For images, oin tha very case, we need them to be shaped as (B, C, H, W) since the networks you use are using 2-dimensional convolutions. If you only wanted to use a MLP to encode those images then you would have needed to shape them as (B, CxHxW). Q2) Using batching does not change the previous answers. I hope this helps, Best, Clément |
Beta Was this translation helpful? Give feedback.
-
Thank you!! Following up on Q1 iii), if I do not specify encoder and decoder (which from the docs seems to mean I am using MLP), are setting input_dim in model_config equal to (12,) and (1, 12) equivalent? I ran it with both and both run with no errors. I'm leaning towards (12,) because in the MNIST colab example, the input_dim = (1, 28, 28) which did not include the batching dimension, but I wanted to double check. |
Beta Was this translation helpful? Give feedback.
-
Awesome thanks! |
Beta Was this translation helpful? Give feedback.
-
I am using RHVAE and would like to input custom data consisting of N vectors of length L. So:
Q1) In the colab tutorial, the dataset are MNIST images so the input is shape (-1, 1, 28, 28). I'm using vectors instead of images (L instead of 28 x28) so I would need to change the input dimensions.
i) What is the 1 for (i.e., why isn't the MNIST images in shape (-1, 28, 28) instead)? Is it for number of channels?
ii) If I have N independent vectors, then would I want to set number of channels to 1?
iii) Does this mean I need to change my train_dataset/eval_dataset to have shape (N1, 1, 1, L) (and (N2, 1, 1, L))? Or (N1, 1, L) (and (N2, 1, L))?
Q2) Does the answer change if I use batching?
I ask because I saw an error comment when I ran the code about 4d for batched and 3d for unbatched, but I'm not sure if that is specific to using Encoder_ResNet_VAE_MNIST/Decoder_ResNet_AE_MNIST.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions