re-establishing GPU support for char-rnn.jl #331

mchristianl · 2021-01-28T09:43:36Z

Hi,

I have just stumbled upon the char-rnn.jl example in my first attempt of checking out Julia machine learning with Flux.jl.

The example works, but produces kind of garbage as a text generator. Since it seems to be on the TODO list anyways in issue 266, I was wondering how difficult It'd be to re-establish the CUDA support that was removed previously in a commit. This would allow (me) to play around with a more intense training of the model to see whether the generated text gains some quality.

I guess the error that @AdarshKumar712 got was

ERROR: LoadError:
Need an adjoint for constructor Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}.
Gradient is of type CuArray{Float32,2}

in line

function loss(xs, ys)
  l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
  return l
end

Any thoughts? Would this be fixable in an easy way?

regards,
Christian

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-01-28T10:43:45Z

Can we re run with the latest flux? Let's prioritise this

mchristianl · 2021-01-28T11:28:49Z

Yes: this particular error seems to be fixed in 0.12.0-dev and I get another error

FATAL ERROR: Symbol "__nv_sqrt"not found

which might indicate that my attempt to restore the file to it's previous version failed:

--- char-rnn.jl 2021-01-28 12:27:16.238144639 +0100
+++ char-rnn-new.jl 2021-01-28 12:25:24.361953681 +0100
@@ -3,6 +3,7 @@
 using StatsBase: wsample
 using Base.Iterators: partition
 using Parameters: @with_kw
+using CUDA

 # Hyperparameter arguments
 @with_kw mutable struct Args
@@ -51,15 +52,16 @@

     # Constructing Model
     m = build_model(N)
+    m = gpu(m)

     function loss(xs, ys)
-      l = sum(logitcrossentropy.(m.(xs), ys))
+      l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
       return l
     end

     ## Training
     opt = ADAM(args.lr)
-    tx, ty = (Xs[5], Ys[5])
+    tx, ty = (gpu.(Xs[5]), gpu.(Ys[5]))
     evalcb = () -> @show loss(tx, ty)

     Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb = throttle(evalcb, args.throttle))
@@ -84,5 +86,6 @@
 end

 cd(@__DIR__)
-m, alphabet = train()
+m, alphabet = train() # FATAL ERROR: Symbol "__nv_sqrt"not found
 sample(m, alphabet, 1000) |> println

DhairyaLGandhi · 2021-01-28T11:32:10Z

Which version of cuda are you using?

mchristianl · 2021-01-28T11:39:16Z

the Julia Package is at 2.4.1 and CUDA.versioninfo() gives me

CUDA toolkit 11.1.1, artifact installation
CUDA driver 11.1.0
NVIDIA driver 456.71.0

Libraries:
- CUBLAS: 11.3.0
- CURAND: 10.2.2
- CUFFT: 10.3.0
- CUSOLVER: 11.0.1
- CUSPARSE: 11.3.0
- CUPTI: 14.0.0
- NVML: 10.0.0+430.86
- CUDNN: 8.0.4 (for CUDA 11.1.0)
- CUTENSOR: 1.2.1 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: Quadro GP100 (sm_60, 14.738 GiB / 16.000 GiB available)

CarloLucibello · 2021-01-30T08:44:28Z

Probably no going to fix the problem, but

l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))

should be changed to

using Flux.Losses
l = logitcrossentropy(m.(gpu.(xs)), gpu.(ys), agg=sum)

DhairyaLGandhi · 2021-01-30T08:45:47Z

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

CarloLucibello · 2021-01-30T17:33:30Z

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

Yes we have discussed this before and as a result we have the whole Losses module based on the second approach and the first approach is deprecated

ToucheSir transferred this issue from FluxML/Flux.jl Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-establishing GPU support for char-rnn.jl #331

re-establishing GPU support for char-rnn.jl #331

mchristianl commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

mchristianl commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

mchristianl commented Jan 28, 2021

CarloLucibello commented Jan 30, 2021

DhairyaLGandhi commented Jan 30, 2021

CarloLucibello commented Jan 30, 2021

re-establishing GPU support for char-rnn.jl #331

re-establishing GPU support for char-rnn.jl #331

Comments

mchristianl commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

mchristianl commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

mchristianl commented Jan 28, 2021

CarloLucibello commented Jan 30, 2021

DhairyaLGandhi commented Jan 30, 2021

CarloLucibello commented Jan 30, 2021