Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cannot unload pdf #2395

Open
wangshuai-wuhan opened this issue Mar 31, 2024 · 3 comments
Open

[Bug]: cannot unload pdf #2395

wangshuai-wuhan opened this issue Mar 31, 2024 · 3 comments
Labels
area: backend Related to backend functionality or under the /backend directory bug Something isn't working

Comments

@wangshuai-wuhan
Copy link

What happened?

A bug happened!

Relevant log output

2024-03-31 16:17:43 worker        | [INFO] models.files [files.py:71]: Computing documents from file Alshenawy et al_2018_Skin friction behavior of pile fully embedded in limestone.pdf
2024-03-31 16:17:45 backend-core  | INFO:     127.0.0.1:40638 - "GET /healthz HTTP/1.1" 200 OK
2024-03-31 16:17:47 worker        | [2024-03-31 08:17:47,532: INFO/ForkPoolWorker-58] pikepdf C++ to Python logger bridge initialized
2024-03-31 16:17:47 worker        | [2024-03-31 08:17:47,800: WARNING/ForkPoolWorker-58] [nltk_data] Error loading punkt: <urlopen error [Errno 99] Cannot
2024-03-31 16:17:47 worker        | [nltk_data]     assign requested address>
2024-03-31 16:17:47 worker        | [2024-03-31 08:17:47,801: ERROR/ForkPoolWorker-58] 
2024-03-31 16:17:47 worker        | **********************************************************************
2024-03-31 16:17:47 worker        |   Resource punkt not found.
2024-03-31 16:17:47 worker        |   Please use the NLTK Downloader to obtain the resource:
2024-03-31 16:17:47 worker        | 
2024-03-31 16:17:47 worker        |   >>> import nltk
2024-03-31 16:17:47 worker        |   >>> nltk.download('punkt')
2024-03-31 16:17:47 worker        |   
2024-03-31 16:17:47 worker        |   For more information see: https://www.nltk.org/data.html
2024-03-31 16:17:47 worker        | 
2024-03-31 16:17:47 worker        |   Attempted to load tokenizers/punkt/PY3/english.pickle
2024-03-31 16:17:47 worker        | 
2024-03-31 16:17:47 worker        |   Searched in:
2024-03-31 16:17:47 worker        |     - '/root/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/local/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/local/share/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/local/lib/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/share/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/local/share/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/lib/nltk_data'
2024-03-31 16:17:47 worker        |     - '/usr/local/lib/nltk_data'
2024-03-31 16:17:47 worker        |     - ''
2024-03-31 16:17:47 worker        | **********************************************************************

Twitter / LinkedIn details

No response

@wangshuai-wuhan wangshuai-wuhan added the bug Something isn't working label Mar 31, 2024
Copy link

dosubot bot commented Mar 31, 2024

Thank you for providing the detailed logs. It indeed seems like the NLTK library is unable to download the 'punkt' tokenizer due to some network issue. To better assist you, could you please provide the following additional information:

  1. The operating system and its version you are using.
  2. The version of Python you are using.
  3. The version of NLTK library installed in your environment.
  4. If you are running the application in a virtual environment or Docker, please provide details about it.
  5. If you are behind a proxy or firewall, as this might be blocking the download request.
  6. If possible, try running the command nltk.download('punkt') in a Python shell and share the output.

This information will help me understand the context better and provide a more accurate solution.

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the area: backend Related to backend functionality or under the /backend directory label Mar 31, 2024
@wangshuai-wuhan
Copy link
Author

@dosu-bot. I am running the application in a virtual environment or Docker. Using Python 3.11. The system is Windows 10. try running the command nltk.download('punkt') in a Python shell the out is :""" Python 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

nltk.download('punkt')
Traceback (most recent call last):
File "", line 1, in
NameError: name 'nltk' is not defined"""

Copy link

dosubot bot commented Mar 31, 2024

It appears the issue stems from the NLTK library not being installed or properly recognized in your environment, which is why you're seeing a NameError when attempting to run nltk.download('punkt'). To address this, you should first ensure that NLTK is installed in your virtual environment or Docker setup. Execute the following command to install NLTK:

pip install nltk

Once NLTK is installed, you can proceed to download the 'punkt' tokenizer with the following Python commands:

import nltk
nltk.download('punkt')

Make sure these commands are executed within the same virtual environment or Docker where your project is running to avoid any discrepancies.

Regarding the quivr codebase and its dependencies, it seems the NLTK library is not listed as a dependency in its requirements or setup files. You can confirm this by reviewing the Pipfile in the repository.

If you encounter any further issues or have additional questions, feel free to reach out.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Related to backend functionality or under the /backend directory bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant