-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout errors (with GPT2 and simple 'return' model) #80
Comments
Solved for ThreadedStreamer, but still cannot get Streamer to work, with or without managedmodel. For ThreadedStreamer, in case it's of use to anyone else, it was an input/output data mismatch. the data that the worker passes to the function is a list [x], therefore you must call the required element input[0] to obtain the input string for your model, otherwise your model will error out. likewise for the output, package the string as a list for the queue to handle. On doing this, the package works beautifully in this mode. However, I am still having timeout issues with Streamer. Since I cleared up the data handling within flask, I'm very confused as to why threadedstreamer works but streamer does not. I have only a single GPU, but it seems that it should still work (I want to set this up now, so that if I do slide in another GPU it's already ready. my call for streamer, predict_X is a simple return function of whatever text is passed in:
|
I got the small GPT-2 model working smoothly with 4 workers on Streamer on single GPU, however the x-large model with 1 worker gives an OOM error (even though it works well with ThreadedStreamer?) To work with Windows, I followed the same outlay as :
This made the 'freeze_support' errors disappear in win10. My GPU is NVidea GeForce GTX 1080, with 6+GB of RAM, so technically it should support x-large GPT-2 model with 1 worker on Streamer. Any ideas what might be still missing from my setup? Thx in advance for any tips and tricks. :) I am very pleased with the progress so far, and this excellent code you've provided. To be more specific, it looks like the model is already loaded, and I'm trying to reload it, even though it is just 1 worker. Let me try rebooting my system, in case there is some ghost model loaded somehow in the cache. -- Nope, still doesn't load up. It almost looks like it is double loading the GPU on running. 2020-08-04 04:40:20.599118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: . |
Solved for single worker, single GPU, but now cannot get it to run streamer on apache/mod_wsgi: the name=main portion doesn't execute when imported from wsgi file. For naked flask: I had added the following below 'import tensorflow as tf' in order to get the multi-worker to work for the small model. After commenting out, the x-large model works with single worker. So, hopefully if I add more GPU on my system it should only require a simple tweak in my code now. I commented these out:
And now it works again for single worker. I will leave this here in case it is of use to anyone else using your code for sequential generative models. Thx for the excellent code. Further tips for others: place 'import tensorflow as tf' inside the definition of your NN model, to ensure each worker has a clean clearly-assigned canvas so-to-speak. This will mean a long load time for your first predictor call, while tf loads your model, so if using generator to speed up shot-to-shot predictions, set a long timeout (~100) to make sure the first prediction call doesn't timeout your worker. |
In case this is of use to anyone else: multiprocessing in python through apache-mod_wsgi on windows seems impossible to achieve. Therefore I created a dual boot in Debian, to explore API capabilities using nginx-uwsgi with service streamer. As others noted, it doesn't seem to work with 'spawn', however changing 'spawn' to 'root' in the two call instances of service-streamer allows you to run multi-processing on a production server in linux. This could theoretically also work in WSL2, if one wishes to stay in windows, however at present the WSL2 GPU cuda support is highly inefficient, creating a x5-10 slowdown in running (I also personally verified this on my code). I have not yet tried redis-streamers, but I'm confident that should work without issue (as and when required), and it's nice to know that i could have multi-gpu multiprocessing support if i do go that route. Note: Just make sure 'master = false' in the [uwsgi] *.ini file, otherwise predictions will hang and your workers will timeout. Also, waking after suspend causes predictions to hang, so it appears the GPU needs to always be online, i.e. cannot configure 'wake-on' calls at the moment. |
@m4gr4th34 thanks for your interest. The main difference between Another tips: apache-mod_wsgi is outdated for python server. If you must use Windows(which is not recommended) to deploy your server, just use |
Thanks, that explains why apache-mod_wsgi is so poorly documented. I am working in linux now, I like the new debian features. well, it seems most things in CS are poorly documented, tbh. A question about multiprocessing: my GPT2 model saturates my single GPU, if I install a second GPU is it possible to use streamer to use gpu1 for worker1 and gpu2 for worker2? |
yes, take a look: https://github.com/ShannonAI/service-streamer#distributed-gpu-worker |
Hi, thanks for this very interesting library. I'm trying to use it to handle multiple requests to a gpt-2 chatbot on a server, currently running on a single gpu, in flask app, on apache server, on win10, using tensorflow 1.x for gpu. I am using python 3.7 in a virtual environment. Using ThreadedStreamer or Streamer I only get time-out response. I'm debugging using the smallest gpt2 model, which takes ~5 seconds from launch to reponse, so I'm very confused about where your code is getting hung? To debug, I created a short function that simply returns any input text: this works with ThreadedStreamer, but gives timeout with Streamer. I do not know what else I can try to debug further. (I know I wont get much performance enhancement with gpt2 using this service-streamer on single gpu right now, but I would like it to handle request queues for now, and perhaps when I use multi-gpu in the future).
Thanks in advance for any advice!
sample calls in flask app:
Here is the error log:
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] [2020-08-01 17:26:09,025] ERROR in app: Exception on /gentest5 [POST]\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] Traceback (most recent call last):\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 2447, in wsgi_app\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] response = self.full_dispatch_request()\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] rv = self.handle_user_exception(e)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask_cors\extension.py", line 161, in wrapped_function\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] return cors_after_request(app.make_response(f(*args, **kwargs)))\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1821, in handle_user_exception\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] reraise(exc_type, exc_value, tb)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\_compat.py", line 39, in reraise\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] raise value\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] rv = self.dispatch_request()\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1936, in dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] return self.view_functionsrule.endpoint\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask_cors\decorator.py", line 128, in wrapped_function\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] resp = make_response(f(*args, **kwargs))\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:/Users/irfan/Python_coding_folder/ChatBots/GPT2Local/gpt-2\WebAPI.py", line 218, in get_gentest5\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] outputs = streamer_52.predict(inputs)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 132, in predict\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] ret = self._output(task_id)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 122, in _output\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] batch_result = future.result(WORKER_TIMEOUT)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 41, in result\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] raise TimeoutError("Task: %d Timeout" % self._id)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] TimeoutError: Task: 0 Timeout\r
The text was updated successfully, but these errors were encountered: