Import is very time -consuming: from unnersted.partition.pdf import partition_pdf
t1 = time.time() from unstructured.partition.pdf import partition_pdf t2 = time.time() print(t2-t1)
I run this, it takes nearly 1 minute
my env: cpu:13th Gen Intel(R) Core(TM) i7-13700K 3.40 GHz ram:32.0 GB nvidia 4070
Is this time -consuming situation normal? Or is it wrong?
If I disconnect the network, I can import it at a normal time. What information will this be automatically loaded from the Internet?
Try setting the environment variable:
$ export SCARF_NO_ANALYTICS=true
and see if that makes a difference. On some network configurations the analytics appear to take longer than desired.
It's mentioned in the README here: https://github.com/Unstructured-IO/unstructured?tab=readme-ov-file#chart_with_upwards_trend-analytics
It takes time more than 4 minute to connect network, suggest modifying the logic in scarf_analytics function, set SCARF_NO_ANALYTICS default value to false