Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import is very time -consuming: from unnersted.partition.pdf import partition_pdf #2983

Open
peanutpaste opened this issue May 8, 2024 · 2 comments
Assignees
Labels
investigating Issues that require more information before they are actionable

Comments

@peanutpaste
Copy link

t1 = time.time()
from unstructured.partition.pdf import partition_pdf
t2 = time.time()
print(t2-t1)

I run this, it takes nearly 1 minute

my env:
cpu:13th Gen Intel(R) Core(TM) i7-13700K 3.40 GHz
ram:32.0 GB
nvidia 4070

Is this time -consuming situation normal? Or is it wrong?

@peanutpaste
Copy link
Author

If I disconnect the network, I can import it at a normal time. What information will this be automatically loaded from the Internet?

@scanny
Copy link
Collaborator

scanny commented May 8, 2024

Try setting the environment variable:

$ export SCARF_NO_ANALYTICS=true

and see if that makes a difference. On some network configurations the analytics appear to take longer than desired.

It's mentioned in the README here:
https://github.com/Unstructured-IO/unstructured?tab=readme-ov-file#chart_with_upwards_trend-analytics

@scanny scanny self-assigned this May 8, 2024
@scanny scanny added the investigating Issues that require more information before they are actionable label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Issues that require more information before they are actionable
Projects
None yet
Development

No branches or pull requests

2 participants