-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayDataProvider input format issue #236
Comments
Why are you converting the Julia array to an |
@vchuravy out of desperation. But I can't really see how that would affect anything? |
I am a bit worried that mx.Reshape get's confused between Julias ordering of dimensions: (W, H, Channel, Sample) and mxnet ordering (C, S, H, W). I would recommend doing the reshaping in Julia. There are two methods to investigate if you layout is correct one is: arg_shapes, out_shapes, aux_shapes = mx.infer_shape(net, input=(75,75,1,66), softmax_label =(66))
println("Arguments:")
for (n,s) in zip(mx.list_arguments(net), arg_shapes)
println("\t$n => $s")
end
println("Outputs:")
for (n,s) in zip(mx.list_outputs(net), out_shapes)
println("\t$n => $s")
end and exec = mx.simple_bind(net, mx.cpu(), input=(75,75,1,66))
dbg_str = mx.debug_str(exec) |
So net in my case would be mxData right? If I build the data according to (C, S, H, W): data = zeros(Float32, 1, length(filenames), heigth, width) # <-------------------------
label = zeros(Int64, length(filenames))
for i in 1:length(filenames)
image = load(string("datatide1/", filenames[i], ".jpg"))
image_resized = imresize(image, heigth, width)
temp = convert(Array{Float32}, image_resized)
data[1,i,:,:] = temp # <-------------------------
label[i] = classDict[labels[i]]
end
mxData = mx.Variable(:data)
mxLabel = mx.Variable(:softmax_label)
batch_size = 2
train_provider = mx.ArrayDataProvider(:data => data,
:softmax_label => label,
batch_size=batch_size,
shuffle=true) I get shape mismatch (the same if I transpose): AssertionError: Number of samples in softmax_label is mismatch with data Or... I assume you want me to reshape the Julia array to the format MXNet wants? Or should I feed it in as the Julia array, i.e (W, H, Channel, Sample)? |
MXNet.jl will handle the transformation from Julia order to C/C++ order for you.
|
Your label is between 0 and 1? You can use From a purely network design perspective, I would use Have you experimented with the size of your |
Yes labels are: I have experimented with the architecture (also having more and wider FC's) but what strikes me as weird is just that it gives the exact same probability for both images with mx.predict. Isn't that super strange? I mean even if it hasn't learned anything. It feels to me as it is scoring on the same image all the time or something. |
I agree that it is a bit weird and it looks like you are hitting a local optima over and over again. Is your dataset balanced and are you using a validation dataset? (Also check that your dataset actually contains what you expect by converting |
Yes I have verified that the data is what it is supposed to be. I have now implemented dropout, batch norm, two more conv layers and expanded the FC's with the same result. When I do |
SOLVEDThis weird behavior originated from a too small batch size! When I increased the batch size everything started working as expected and I reached a 99 % accuracy on both classes within 100 epochs. - HOW is it possible that the batch size influenced the result of the model in such a way!? |
What was your batch size if I may ask. Also which version of MXNet proper
are you using and which optimizer?
…-V
On Mon, 17 Apr 2017 at 21:08 Petter ***@***.***> wrote:
SOLVED
This weird behavior originated from a too small batch size! When I
increased the batch size everything started working as expected and I
reached a 99 % accuracy on both classes within 100 epochs.
*- HOW is it possible that the batch size influenced the result of the
model in such a way!?*
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#236 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAI3ama7O29z1kSaXAXSyLlO28vNdadTks5rw1YggaJpZM4M-sex>
.
|
Changed from 1 to 5 and I get good results with all adaptive lr algorithms! This in particular was with RMSProp, but got just as good with ADAM. |
@pluskid Do you think that we are hitting a particular cornercase here or is that just a side effect of using SGD based methods? As far as I understand it SGD estimates the real gradient via a sample of the data. SGD should still work on one sample. |
Interesting. Batch size = 1 should work theoretically (with properly chosen hyperparameters), despite being inefficient. @vchuravy I'm leaving for a flight in 2 minutes, do you have a chance to run our existing examples (such as MNIST or even simpler ones that use Array provider directly) by changing the batch size to 1? Just to see if it is some bug. |
So I'm trying to do image classification with Arraydataprovider without success and I'm pretty confident it is related to that MXNet doesn't feed in the data correctly.
Raw data input is 66 b/w images.
So the data is first a 75x75x66 matrix with two classes (faces and houses).
So now the data is reshaped to fit Conv net input that needs to be a 4D vector.
I have varied the hyper parameters a lot now and always get stuck in the same minima of 0.63% accuracy.
What makes me think that there is an error with the data feed is that when I'm predicting on two random samples (one house, one face) not previously seen by the model, it always gives out the same probabilities:
And this just doesn't make sense since one sample is house and one is face PLUS that I've changed the hyper parameters drastically (including optimization algorithms).
Can anyone see if I'm doing something wrong with the data input or if there is something else fundamentally strange with my implementation?
@pluskid
The text was updated successfully, but these errors were encountered: