Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to produce images > 800²px on cards with more than 4GB RAM #39

Open
neuralisator opened this issue Jun 1, 2016 · 48 comments
Open

Comments

@neuralisator
Copy link

neuralisator commented Jun 1, 2016

A couple of users (here, here and here) including myself experienced the problem that they cannot create images with resolutions larger than around 800²px even though the GPU RAM should allow larger than that. I am running a script that finds the maximum possible image size that can be computed by divide and conquer. Images are scaled to a certain size and the conversion script is run against them. If it fails or succeeds, it adapts the size until a working image size is found. As the image size is reduced in every fail case, it can be ruled out that the cause of the error is a too large image size, as a size that works will always be found. I am running this on a Titan X with 12GB RAM and monitor it using nvidia-smi. What happens is that the tested images with respective sizes above 800²px do consume memory above 4GB, but even if they don't max it out, that is, reach the 12GB, they will still fail to convert unless the size is back down to what I could formerly also produce on a 4GB GPU. An exception is thrown at some point when the script is updating layer "deeppy.feedforward.activation_layers.ReLU".

Others reported to be able to create images with larger sizes (here and here). @alexjc noted that he had to "manually free some buffers" to create larger images, maybe he can give us some insights?

I am running Linux mint 17 / nvidia driver version 352.93 / cuda 7.5 / cudnn5 with cudarray master (w/ cudnn5 support, but the error occurs also with cudnn disabled) / deeppy master.

@filmo
Copy link

filmo commented Jun 1, 2016

One thing I've noticed when running nvidia-smi -lms 100 (100 ms update) is that you see the memory size 'peak' for a moment. Thus while your image may only be 4GB of RAM, the actual usage will peak at something much higher than that for a moment (and if in excess of your RAM, crash out).

I'll let @andersbll comment more completely, but it seems like it is probably related to how the cudarray and deeppy library handle memory. Keep in mind cudarray and deeppy are pretty much solo-developer frameworks, so probably not as well refined as torch or caffe. There is a torch implementation of style-transfer, you may want to try that as well. Torch is a much more developed deep framework, so it probably handles memory allocations better.

At the very least, give it a shot and see if it handles 800 x 800 images.

@neuralisator
Copy link
Author

neuralisator commented Jun 1, 2016

Thanks for the recommendations, @filmo. I have actually tested a couple of different implementations, including that (neural-style adam/lbfgs, neural-art, neural-style-tf). That said, neural_artistic_style in my eyes yields the best results of them all by far, which is pretty damn impressive, given the solo developer background. The more a pity it is that it is the only tested implementation that suffers from this limit (at least in some configurations).

Also I want to say once again that I do not think this is a memory spike. As I said I am automatedly testing many different sizes of images and a 4GB card can produce the exact same size as a 12GB.

@neuralisator
Copy link
Author

To clarify, I do not completely rule out a memory spike, but it wouldn't be one on a "regular" scale. Such a massive spike that does not occur with "<= 4GB" but immediately after "> 4GB", filling up the remaining 8GB of the card, would actually be another description of the problem then.

@filmo
Copy link

filmo commented Jun 2, 2016

Sorry I wasn't clear on what you were asking.

I'm using a 980ti with 6GB and I'm able to apply style to images that are 1600 x 1057 and 1750 x 976 in size respectively. I think my 1600 x 1057 is about as big as I can go using this implementation.

(I'm using the cudnn4 version of cudarray and deeppy from back in February and have not yet upgraded to cudnn5 yet. Perhaps that's part of the difference. ??? I also wonder if there's something particular about the Titan?? )

I agree, I also prefer the images created by this implementation over neural_style. Not sure why there are significant difference, but there definitely are.

@andersbll
Copy link
Owner

Hi! Thanks for the nice writeup, I wish I could reproduce the error myself. I think you are right @neuralisator, this does not look like an out-of-memory problem. Is this the problem we are trying to solve? In that case, could you insert a print(shape) just before line 45 in cudarray/linalg.py. I would like to start out checking the arguments to cudarray.empty().

Regarding the visual style, I perform layer-level normalization of gradients. In the VGG-net, the features in each layer may exist on different scales. By L1-normalizing the gradient signals, I get a more even contribution across the different layers. I suspect this is the secret sauce. :)

@neuralisator
Copy link
Author

neuralisator commented Jun 2, 2016

Hello @andersbll and thanks for replying so quickly. Yes, the error you linked is the one it's about.

Before we get to debugging:

About the quality of the images: The style is just applied so much better than in any other implementation :)
Plus, it's super fast. Comparison is a little tricky here of course, basically it comes down to: What do you get from the different implementations after the same amount of time has passed?
For what I see, it beats the competition there as well. At this point I wouldn't understand the technical details, I can just say that this is an exceptionally great piece of work and I would love to see it work with high resolution images.
Which leads me to a last question, as you said you weren't able to reproduce the problem: Can you actually produce images > 800²px (as those in this test), or are you running a 4GB card that and only that "769px" test worked for you?

Now let's debug:

I think you meant print(out_shape) ? I modified the code so it looks like this:

def dot(a, b, out=None):
    if a.ndim == b.ndim == 1:
        return inner(a, b)

    if a.dtype != b.dtype:
        raise ValueError('dtype mismatch')

    out_shape = matmul_shape(a.shape, b.shape)
    print ('out:', out, ', a.dtype: ', a.dtype, ', out_shape:', out_shape)
    if out is None:
        out = cudarray.empty(out_shape, dtype=a.dtype)
    else:

I also added a logline to show the currently processed layer (in style_network.py).

I resized the tuebingen / starry_night images to an adequate size for the testing.
They can be downloaded here and here
Memory consumption is in the end about 5.2GB with those 2 images (see below).

This is the output:
output.txt

And these are the memory stats during the process:
memory.txt

@andersbll
Copy link
Owner

Regarding speed, maybe the other implementation are using a different optimization method. The original paper uses L-BFGS as far as I remememer. It is a bit heavy compared to first-order method I use (Adam).

I have a GPU with 12 GB RAM and I can produce images larger than 800^2 pixels including the images you have attached.

Thanks for the output.txt. I can't see anything wrong there.

Can you provide me with the output of the commands ldd <path to libcudarray.so> and uname -a and python -V?

@neuralisator
Copy link
Author

neuralisator commented Jun 3, 2016

I hope the formatting works better this time:

$ ldd /usr/local/lib/libcudarray.so
    linux-vdso.so.1 =>  (0x00007fff3f75e000)
    libcudart.so.7.5 => /usr/local/cuda/lib64/libcudart.so.7.5 (0x00007f1464a34000)
    libcublas.so.7.5 => /usr/local/cuda/lib64/libcublas.so.7.5 (0x00007f1463154000)
    libcurand.so.7.5 => /usr/local/cuda/lib64/libcurand.so.7.5 (0x00007f145f8ec000)
    libcudnn.so.5 => /usr/local/cuda/lib64/libcudnn.so.5 (0x00007f145bda1000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f145ba73000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f145b76d000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f145b557000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f145b191000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f145af8d000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f145ad6f000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f145ab66000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1465816000)

$ uname -a
Linux base 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

(this is a Mint 17 installation)

$ python -V
Python 2.7.6

@andersbll
Copy link
Owner

Thanks, can you try to install Anaconda Python and use that instead?

@neuralisator
Copy link
Author

Will do and report back, it can take a while though.

@neuralisator
Copy link
Author

Setting everything up with anaconda did the trick. It's beautiful :D Thanks @andersbll - I will give more details tomorrow.

@andersbll
Copy link
Owner

Hooray, I'm glad to hear that! What version of the package Cython is your old Python installation using? Could you try updating that to the latest and see if it helps?

@neuralisator
Copy link
Author

The version that is used by the default installation is 0.24.

Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import Cython
print(Cython.version)
0.24

Now here's the funny part: I have been writing an install script to do the steps that I did, for reference, and now it's broken again. I can't really make sense of it. It definitely worked, I created several high resolution images. And now I'm getting the exact same error with the anaconda installation. So I'll be busy for a bit, banging my head against the table and trying to figure out what's going on there. I'll report back.

@neuralisator
Copy link
Author

Sadly, I couldn't get it to work in a reproducible way. As I wasn't / am not deeply familiar with the python package management, different distributions and the library paths, I probably made a big mess at some point in this installation orgy and accidentally hit a working combination for a second time. In retrospective, I shouldn't have touched it anymore, but I wanted to write an install script that works reproducibly, and installing from that broke it a second time, and I am basically back to zero and get the error message every-single-time. Now I scripted this and am through installing with like every possible combination of versions/settings I can imagine.

The installation process is as follows:

  • install cuda toolkit (7.5) + cudnn (4 or 5) (/usr/local/cuda)
  • install anaconda (~/anaconda2)
  • in case, create a new conda environment (~/anaconda2/envs/...) and activate it
  • install cudarray (commit before cudnn5 merge if cudnn4, or else master)

Without creating a conda environment, just prepending anaconda2/bin to PATH, this will install cudarray with numpy 1.10-4 and cython 0.23-4. I can upgrade to numpy 1.11 and cython 0.24 using conda install, tried all combinations. I actually started with the virtual environment and installed the newer versions which didn't work. Then I used the old versions by setting PYTHONPATH and somewhere after that it worked. So i figured the older versions are working and the newer aren't. Well, there was obviously more to that.
Note that I cleared out the installation directories every time, so there wouldn't be any remainders from previous tries.
With a separate conda environment, I didn't have the old lib versions available unless I set PYTHONPATH=~/anaconda2/lib/python2.7/site-packages, so installing the newer packages was mandatory there, also scipy had to be installed later.
With libcudarray.so installed to ~/anaconda2/lib (INSTALL_PREFIX), it wouldn't be found, unless I copied it to e.g. /usr/local/lib, so I did this after every compile. Tried both CUDNN_ENABLED=1 and =0.

Finally,

  • install deeppy
  • run test

I have even run this in nvidia-docker with both regular python and anaconda exclusively installed, and again, no luck.
What is bugging me the most is that I had a working configuration 2 times, and I wrecked it again.
I am probably completely missing or doing something fundamentally wrong here. As I am running out of ideas, any thoughts are appreciated.

@neuralisator
Copy link
Author

To be clear, INSTALL_PREFIX of libcudarray.so would be ~/anaconda2/ and the file would reside in ~/anaconda2/lib then.

@andersbll
Copy link
Owner

Ok, thanks for the thorough description. I might try to use some other Python installations and see if I can reproduce the error.

Just to be sure, you are using a 64-bit version of Python, right? 32-bit might be problematic above 4GB. :)

@neuralisator
Copy link
Author

Yes, both stock and anaconda python2.7 are reported as 64-bit LSB executable.

@neuralisator
Copy link
Author

neuralisator commented Jun 6, 2016

In the meantime I installed it on a fresh Linux Mint 17 and a Debian 7.1 installation. Anaconda variant on Mint failed. On debian I used pip to install the libraries, which for a change gave me numpy 1.11.1rc1 - it also failed. Running the anaconda version on Debian - failed.

If I didn't have the images I created back then, I'd start to think I hallucinated it ever working. How can it be it fails on every installation? How can it be it actually did work at some point? It must have used some libraries that were already present on the system or something.

This all makes no sense to me.

@neuralisator
Copy link
Author

neuralisator commented Jun 6, 2016

I just ran the conversion test with cuda-memcheck, and when it crashes it puts out many of the following error messages. What differs is the thread number, so i assume every thread crashes with the same error here.
Maybe that is of any help?

('layer ', <style_network.Convolution object at 0x7f7ea65dc9d0>)
========= Invalid __global__ read of size 4
=========     at 0x00001130 in void cudarray::kernel_win2img<float>(float const *, int, int, int, int, int, int, int, int, int, int, int, int, cudarray::kernel_win2img<float>*)
=========     by thread (59,0,0) in block (1242,0,0)
=========     Address 0x73fa0f748 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray7win2imgIfEEvPKT_iiiiiiiiiPS1_ + 0x313) [0xb0d93]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray27conv_bc01_matmul_bprop_imgsIfEEvPKT_S3_iiiiiiiiiiiPS1_ + 0x1d7) [0x42377]
=========     Host Frame:/home/alex/anaconda2/lib/python2.7/site-packages/cudarray-0.1.dev0-py2.7-linux-x86_64.egg/cudarray/wrap/nnet.so [0xaf8e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x88e5) [0xfce15]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCode + 0x32) [0xfdb42]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_FileExFlags + 0xb0) [0x11e050]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_SimpleFileExFlags + 0xef) [0x11e22f]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (Py_Main + 0xca4) [0x133b74]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:python [0x649]
=========
========= Invalid __global__ read of size 4
=========     at 0x00001130 in void cudarray::kernel_win2img<float>(float const *, int, int, int, int, int, int, int, int, int, int, int, int, cudarray::kernel_win2img<float>*)
=========     by thread (58,0,0) in block (1242,0,0)
=========     Address 0x73fa0f744 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray7win2imgIfEEvPKT_iiiiiiiiiPS1_ + 0x313) [0xb0d93]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray27conv_bc01_matmul_bprop_imgsIfEEvPKT_S3_iiiiiiiiiiiPS1_ + 0x1d7) [0x42377]
=========     Host Frame:/home/alex/anaconda2/lib/python2.7/site-packages/cudarray-0.1.dev0-py2.7-linux-x86_64.egg/cudarray/wrap/nnet.so [0xaf8e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x88e5) [0xfce15]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCode + 0x32) [0xfdb42]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_FileExFlags + 0xb0) [0x11e050]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_SimpleFileExFlags + 0xef) [0x11e22f]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (Py_Main + 0xca4) [0x133b74]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:python [0x649]

...

@neuralisator
Copy link
Author

neuralisator commented Jun 6, 2016

I think I just found the solution. I didn't export the CUDNN_ENABLED=1 environment variable when installing cudarray but only ran CUDNN_ENABLED=1 make. Now I noticed that in the setup.py file it checks again for the variable. So I probably manually exported that some time inbetween. Now it's working again with the anaconda installation, didn't check anything else yet, just wanted to post that. I will not touch this now and verify it in a docker image or another system later. So basically i was too stupid to follow the installation instructions. Sorry for that. Will post more results later.

@neuralisator
Copy link
Author

Verified :D this was the problem. It also works with stock python now. However, the error still occurs if CUDNN_ENABLED=0 is set (or not set at all) during cudarray installation. So this might still be worth looking into. @andersbll let me know if you require any more info. Also thanks for your support, and keep up the good work :) I'll ping 2 more people who seemed to have the same problem: @FabienLavocat and @mirzman

@andersbll
Copy link
Owner

Ah, great job finding it finally! I admire your persistence. :)

I will try to look into kernel_win2img() at a later point. It seems like this is the problem.

@mirzman
Copy link

mirzman commented Jun 7, 2016

I exported CUDNN_ENABLED=1. But the problem stays...

@neuralisator
Copy link
Author

@mirzman just do make sure: export CUDNN_ENABLED=1 has to be set before compiling/installing libcudarray (for both make and setup.py). If you compiled it in the same folder before, I suggest deleting that and creating a new clone of the repo. I haven't tested if you also have to reinstall deeppy, but in case, the same might apply there. Also you have to make sure the freshly compiled libcudarray.so is actually used, and not another old version that might be lingering somewhere. cuda-memcheck shows that (see above).

@mirzman
Copy link

mirzman commented Jun 7, 2016

git clone https://github.com/andersbll/cudarray.git
git clone https://github.com/andersbll/deeppy.git
git clone https://github.com/andersbll/neural_artistic_style.git

export CUDNN_ENABLED=1

cd cudarray
make -j8 -B
sudo make install
sudo python setup.py install
cd ..

cd deeppy
sudo python setup.py install
cd ..

cd neural_artistic_style
./neural_artistic_style.py --network ~/imagenet-vgg-verydeep-19.mat --iterations 201 --subject ~/chern_s9.jpg --style images/starry_night.jpg --output o9.png
cd ..

it failes

sudo mv /usr/local/lib/libcudarray.so /usr/local/lib/libcudarray.so1

cd neural_artistic_style
./neural_artistic_style.py --network ~/imagenet-vgg-verydeep-19.mat --iterations 201 --subject ~/chern_s9.jpg --style images/starry_night.jpg --output o9.png
cd ..

sudo mv /usr/local/lib/libcudarray.so1 /usr/local/lib/libcudarray.so

it outputs "CUDArray: CUDA back-end not available, using NumPy."

@neuralisator
Copy link
Author

@MirzaN if I export CUDNN_ENABLED=1 here on the usual user account and then sudo, the variable is not set in that context. Please check the compiler output, if -DCUDDN_ENABLED=1 shows up there. If not, that will probably be your issue.

@neuralisator
Copy link
Author

neuralisator commented Jun 7, 2016

Correction: as you don't run sudo before make, that should be fine. But you can't run setup.py with sudo like this. It's the exact same situation that I had in the end.

@andersbll
Copy link
Owner

If you install with sudo you need to export the environment variables as well. This is done with sudo -E.

@mirzman
Copy link

mirzman commented Jun 7, 2016

make outputs:

g++ -DCUDNN_ENABLED -O3 -fPIC -Wall -Wfatal-errors -I./include -I/usr/local/cuda/include -c -o src/nnet/conv_bc01_matmul.o src/nnet/conv_bc01_matmul.cpp
nvcc -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=compute_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_30,code=compute_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -O3 --compiler-options '-DCUDNN_ENABLED -O3 -fPIC -Wall -Wfatal-errors' --ftz=true --prec-div=false -prec-sqrt=false --fmad=true -I./include -I/usr/local/cuda/include -c -o src/nnet/pool_b01.o src/nnet/pool_b01.cu
...

also:

$ echo $CUDNN_ENABLED
1
$ sudo echo $CUDNN_ENABLED
1

@andersbll
Copy link
Owner

@mirzman: what is the output of ldd <path to libcudarray.so>? I think you need to update the environment variable LD_LIBRARY_PATH to point to the correct libraries.

@neuralisator
Copy link
Author

@mirzman It's not the same:
sudo bash -c "export" | grep -i cudnn_enabled gives me no result, while sudo echo $CUDNN_ENABLED reports 1. Don't ask me why, please verify running it as @andersbll said.

@mirzman
Copy link

mirzman commented Jun 7, 2016

$ ldd /usr/local/lib/libcudarray.so 
    linux-vdso.so.1 =>  (0x00007fff5dbc3000)
    libcudart.so.7.5 => /usr/local/cuda-7.5/lib/libcudart.so.7.5 (0x00007f3cc3b53000)
    libcublas.so.7.5 => /usr/local/cuda-7.5/lib/libcublas.so.7.5 (0x00007f3cc2274000)
    libcurand.so.7.5 => /usr/local/cuda-7.5/lib/libcurand.so.7.5 (0x00007f3cbea0c000)
    libcudnn.so.5 => /usr/lib/x86_64-linux-gnu/libcudnn.so.5 (0x00007f3cb9d6f000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3cb9a6b000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3cb9765000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3cb954f000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3cb918a000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3cb8f86000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3cb8d68000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3cb8b60000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f3cc46ba000)

I used sudo -E. Now it outputs:

terminate called after throwing an instance of 'std::runtime_error'
  what():  ./include/cudarray/nnet/cudnn.hpp:109: The cuDNN library was not initialized properly.
./run.sh: line 18: 30576 Aborted                 (core dumped) ./neural_artistic_style.py --network ~/imagenet-vgg-verydeep-19.mat --iterations 201 --subject ~/chern_s9.jpg --style images/starry_night.jpg --output o9.png

@neuralisator
Copy link
Author

That means it is finally using CUDNN during installation. So far so good. I'm sorry I can't help with the next error you just ran into though, as I never encountered it. But this cudnn/non-cudnn mixup is out of the way now.

@neuralisator
Copy link
Author

Outdated, but maybe relevant: #16

@mirzman
Copy link

mirzman commented Jun 16, 2016

It works! I just uninstalled and installed cudnn for several times.
Thank you a lot!

@mxchinegod
Copy link

@mirzman You have it working for larger images with your graphics card?

@mtancoigne
Copy link

Hi i'm landing on this thread,

I didn't read all the thing, but I remarked this : the size limitation seems to be somewhat tied with the combined size of both the subject and the style pic (ie, for a subject of 800x800 and a style of 800x800, that would give 800x800+800x800=1 280 000 pixels in total.

I found the limit between 611496 and 738048 for a GT635M (see #42, there is a link to my tests)

For now, it seem working, I tried multiple combinations of pictures doing a total around 611496px and it works... I update the list linked in the issue as soon as I have new high/low limits...

I may be totally wrong about my deductions, but then come on #42 to discuss :)

@mirzman
Copy link

mirzman commented Jul 6, 2016

@DylanAlloy Yes, now it works. It was 2 problems:

  1. sudo python setup.py install --> sudo -E python setup.py install
    (CUDNN_ENABLED=1)
  2. incorrect installation of cudnn

@mxchinegod
Copy link

@mirzman First off, thanks for the quick reply and working together to help resolve this for everyone who stops by later.

What exactly went wrong during cudnn installation?

@mirzman
Copy link

mirzman commented Jul 6, 2016

@DylanAlloy I don't know :) There was no errors. When I understood, that I have no ideas, I decided to uninstall and install cudnn, cuda and so on. After cudnn the problem have gone :)

@mxchinegod
Copy link

@mirzman If you don't mind me prying a little for more information, what is the largest image you've processed using the code since you've got the CUDA problems resolved?

@mirzman
Copy link

mirzman commented Jul 6, 2016

@DylanAlloy I processed 2298x1280 image. I can test more large image.

@mxchinegod
Copy link

@mirzman that's pretty good. I guess I'll try reinstalling everything. I'm working with a 4GB Amazon nvidia instance so I hope I can figure it out. What amount of memory are you working with?

@mirzman
Copy link

mirzman commented Jul 6, 2016

@DylanAlloy GeForce GTX TITAN 12GB
9192x5120 image failes :)

@mxchinegod
Copy link

@mirzman Ahhh, I've broken the installation but if you're getting 2298x1280 and I was getting 512x512 earlier, I think it sounds about right given the amount of memory we have.

@mirzman
Copy link

mirzman commented Jul 6, 2016

@DylanAlloy Are you sure?
512x512 == 0.25M
2298x1280 = 2.8M
2.8 / 0.25 = 11.2
If the reason is memory it must be 4 :)

@mxchinegod
Copy link

@mirzman True, but at best I'm expecting <1megapixel processing until we know otherwise as per the same math, which is a bummer. Not the fault of the code as my options are simply limited.

@mirzman
Copy link

mirzman commented Jul 6, 2016

#21 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants