Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta-7 different outputs on GPU and CPU regression #124

Open
chooneung opened this issue Oct 14, 2020 · 4 comments
Open

Beta-7 different outputs on GPU and CPU regression #124

chooneung opened this issue Oct 14, 2020 · 4 comments
Labels
bug Something isn't working critical Significant failures affecting the running of repository invalid This doesn't seem right

Comments

@chooneung
Copy link
Contributor

OS: Windows
DL4J: deeplearning4j 1.0.0-beta7
CUDA: 10.2
cuDNN: 7.6
Issue: The results of regression (eg. BostonHousePricePrediction.java) running on GPU are not same as results running on CPU. Attached are the screenshot of the results.

CPU:
Using CPU as backend:
CPU
Result running on CPU backend:
CPU_result

GPU:
Using GPU as backend:
GPU
Result running on GPU backend:
GPU_result

@chooneung chooneung added bug Something isn't working invalid This doesn't seem right critical Significant failures affecting the running of repository labels Oct 14, 2020
@jtkhair
Copy link

jtkhair commented Mar 1, 2021

Any update on this?

I have the same issue. Setting as below

OS: Ubuntu18.04.5
DL4J: deeplearning4j 1.0.0-beta7
CUDA: 10.1
cuDNN: 7.6

@kenghooi-teoh
Copy link
Contributor

Issue Description
Using same model and training config, we saw a huge difference in loss scores where running example, on this dataset using CPU vs using GPU, but only in certain examples i.e. in other examples where CNN is used, we didn't come across the same issue. DL4J version: beta7.

Version Information
OS: Windows, CUDA: 10.2, cuDNN: 7.6
OS: Ubuntu18.04.5, CUDA: 10.1, cuDNN: 7.6
OS: Windows, CUDA: 10.0.130, cuDNN: 7.5

Additional Information
Model config (only contains Dense and Output layers):

MultiLayerConfiguration conf= new NeuralNetConfiguration.Builder()
        .seed(seed)
        .updater(new Adam(learningRate))
        .weightInit(WeightInit.XAVIER)
        .l2(0.001)
        .list()
        .layer(new DenseLayer.Builder()
                .nIn(13)
                .nOut(128)
                .activation(Activation.RELU)
                .build())
        .layer(new DenseLayer.Builder()
                .nIn(128)
                .nOut(64)
                .activation(Activation.RELU)
                .build())
        .layer(new OutputLayer.Builder()
                .nIn(64)
                .nOut(1)
                .activation(Activation.IDENTITY)
                .lossFunction(LossFunctions.LossFunction.MSE)
                .build())
        .build();

Screenshots of different loss:
image

@agibsonccc
Copy link

@chooneung could you give me the full training loop so I can reproduce this out of the box for testing? I need to confirm if this is still the case or not on the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical Significant failures affecting the running of repository invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

4 participants