Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-establishing GPU support for char-rnn.jl #331

Open
mchristianl opened this issue Jan 28, 2021 · 7 comments
Open

re-establishing GPU support for char-rnn.jl #331

mchristianl opened this issue Jan 28, 2021 · 7 comments

Comments

@mchristianl
Copy link

Hi,

I have just stumbled upon the char-rnn.jl example in my first attempt of checking out Julia machine learning with Flux.jl.

The example works, but produces kind of garbage as a text generator. Since it seems to be on the TODO list anyways in issue 266, I was wondering how difficult It'd be to re-establish the CUDA support that was removed previously in a commit. This would allow (me) to play around with a more intense training of the model to see whether the generated text gains some quality.

I guess the error that @AdarshKumar712 got was

ERROR: LoadError:
Need an adjoint for constructor Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}.
Gradient is of type CuArray{Float32,2}

in line

function loss(xs, ys)
  l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
  return l
end

Any thoughts? Would this be fixable in an easy way?

regards,
Christian

@DhairyaLGandhi
Copy link
Member

Can we re run with the latest flux? Let's prioritise this

@mchristianl
Copy link
Author

Yes: this particular error seems to be fixed in 0.12.0-dev and I get another error

FATAL ERROR: Symbol "__nv_sqrt"not found

which might indicate that my attempt to restore the file to it's previous version failed:

--- char-rnn.jl 2021-01-28 12:27:16.238144639 +0100
+++ char-rnn-new.jl 2021-01-28 12:25:24.361953681 +0100
@@ -3,6 +3,7 @@
 using StatsBase: wsample
 using Base.Iterators: partition
 using Parameters: @with_kw
+using CUDA

 # Hyperparameter arguments
 @with_kw mutable struct Args
@@ -51,15 +52,16 @@

     # Constructing Model
     m = build_model(N)
+    m = gpu(m)

     function loss(xs, ys)
-      l = sum(logitcrossentropy.(m.(xs), ys))
+      l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
       return l
     end

     ## Training
     opt = ADAM(args.lr)
-    tx, ty = (Xs[5], Ys[5])
+    tx, ty = (gpu.(Xs[5]), gpu.(Ys[5]))
     evalcb = () -> @show loss(tx, ty)

     Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb = throttle(evalcb, args.throttle))
@@ -84,5 +86,6 @@
 end

 cd(@__DIR__)
-m, alphabet = train()
+m, alphabet = train() # FATAL ERROR: Symbol "__nv_sqrt"not found
 sample(m, alphabet, 1000) |> println

@DhairyaLGandhi
Copy link
Member

Which version of cuda are you using?

@mchristianl
Copy link
Author

the Julia Package is at 2.4.1 and CUDA.versioninfo() gives me

CUDA toolkit 11.1.1, artifact installation
CUDA driver 11.1.0
NVIDIA driver 456.71.0

Libraries:
- CUBLAS: 11.3.0
- CURAND: 10.2.2
- CUFFT: 10.3.0
- CUSOLVER: 11.0.1
- CUSPARSE: 11.3.0
- CUPTI: 14.0.0
- NVML: 10.0.0+430.86
- CUDNN: 8.0.4 (for CUDA 11.1.0)
- CUTENSOR: 1.2.1 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: Quadro GP100 (sm_60, 14.738 GiB / 16.000 GiB available)

@CarloLucibello
Copy link
Member

Probably no going to fix the problem, but

l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))

should be changed to

using Flux.Losses
l = logitcrossentropy(m.(gpu.(xs)), gpu.(ys), agg=sum)

@DhairyaLGandhi
Copy link
Member

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

@CarloLucibello
Copy link
Member

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

Yes we have discussed this before and as a result we have the whole Losses module based on the second approach and the first approach is deprecated

@ToucheSir ToucheSir transferred this issue from FluxML/Flux.jl Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants