unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Import is very time -consuming: from unnersted.partition.pdf import partition_pdf

Open peanutpaste opened this issue 1 year ago • 3 comments

t1 = time.time() from unstructured.partition.pdf import partition_pdf t2 = time.time() print(t2-t1)

I run this, it takes nearly 1 minute

my env: cpu:13th Gen Intel(R) Core(TM) i7-13700K 3.40 GHz ram:32.0 GB nvidia 4070

Is this time -consuming situation normal? Or is it wrong?

peanutpaste avatar May 08 '24 07:05 peanutpaste

If I disconnect the network, I can import it at a normal time. What information will this be automatically loaded from the Internet?

peanutpaste avatar May 08 '24 08:05 peanutpaste

Try setting the environment variable:

$ export SCARF_NO_ANALYTICS=true

and see if that makes a difference. On some network configurations the analytics appear to take longer than desired.

It's mentioned in the README here: https://github.com/Unstructured-IO/unstructured?tab=readme-ov-file#chart_with_upwards_trend-analytics

scanny avatar May 08 '24 19:05 scanny

It takes time more than 4 minute to connect network, suggest modifying the logic in scarf_analytics function, set SCARF_NO_ANALYTICS default value to false

Lilyzzzzzz avatar Aug 22 '24 06:08 Lilyzzzzzz