[Bug]Data-prep ingest file reports errors if process table
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
- [x] Pull docker images from hub.docker.com
- [ ] Build docker images from source
- [ ] Other
Deploy method
- [ ] Docker
- [x] Docker Compose
- [ ] Kubernetes Helm Charts
- [ ] Kubernetes GMC
- [ ] Other
Running nodes
Single Node
What's the version?
V1.2
data-prep ingest file correctly without process tables.
When set process tables, there is errors reported.
Description
This ingests file successfully.
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf" \
Set -F "process_table=true"
This ingests file reports errors.
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf" \
-F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"
Reproduce steps
start chatqna docker compose up -d
This ingests file successfully.
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf" \
Set -F "process_table=true"
This ingests file reports errors.
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf" \
-F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"
Raw log
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf" \
-F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"
The error logs are:
2025-03-04T09:17:20.184165142Z [2025-03-04 09:17:20,183] [ INFO] - redis_dataprep - [ redis ingest] File nke-10k-2023.pdf does not exist.
2025-03-04T09:17:24.255837822Z /home/user/comps/dataprep/src/integrations/redis.py:194: LangChainDeprecationWarning: The class `HuggingFaceBgeEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFaceEmbeddings``.
2025-03-04T09:17:24.255945662Z embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
2025-03-04T09:18:49.242946102Z INFO: 10.112.238.202:39888 - "POST /v1/dataprep/ingest HTTP/1.1" 200 OK
2025-03-04T09:18:57.009354062Z INFO: 10.112.238.202:43050 - "POST /v1/dataprep/get HTTP/1.1" 200 OK
2025-03-04T09:19:11.786637862Z [2025-03-04 09:19:11,785] [ INFO] - redis_dataprep - [ redis delete ] doc id: file:nke-10k-2023.pdf
2025-03-04T09:19:12.486472862Z INFO: 10.112.238.202:49032 - "POST /v1/dataprep/delete HTTP/1.1" 200 OK
2025-03-04T09:19:19.770719062Z INFO: 10.112.238.202:37702 - "POST /v1/dataprep/get HTTP/1.1" 200 OK
2025-03-04T09:19:50.701691142Z [2025-03-04 09:19:50,700] [ INFO] - redis_dataprep - [ redis ingest] File nke-10k-2023.pdf does not exist.
2025-03-04T09:21:33.633908062Z Failed to initialize the model.
2025-03-04T09:21:33.633998942Z Ensure that the model is correct
2025-03-04T09:21:33.932197862Z [2025-03-04 09:21:33,931] [ ERROR] - opea_dataprep_microservice - Error during dataprep ingest invocation: Review the parameters to initialize a UnstructuredTableTransformerModel obj
2025-03-04T09:21:33.940831822Z INFO: 10.112.238.202:52138 - "POST /v1/dataprep/ingest HTTP/1.1" 500 Internal Server Error
2025-03-04T09:21:33.964683022Z ERROR: Exception in ASGI application
2025-03-04T09:21:33.964751182Z Traceback (most recent call last):
2025-03-04T09:21:33.964778982Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
2025-03-04T09:21:33.964807022Z sock = connection.create_connection(
2025-03-04T09:21:33.964836982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.964863662Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
2025-03-04T09:21:33.964890462Z raise err
2025-03-04T09:21:33.964918262Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
2025-03-04T09:21:33.964946702Z sock.connect(sa)
2025-03-04T09:21:33.964975342Z TimeoutError: timed out
2025-03-04T09:21:33.965003622Z
2025-03-04T09:21:33.965033142Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965064582Z
2025-03-04T09:21:33.965091982Z Traceback (most recent call last):
2025-03-04T09:21:33.965120862Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 773, in urlopen
2025-03-04T09:21:33.965148262Z self._prepare_proxy(conn)
2025-03-04T09:21:33.965174982Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1042, in _prepare_proxy
2025-03-04T09:21:33.965203702Z conn.connect()
2025-03-04T09:21:33.965234382Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 704, in connect
2025-03-04T09:21:33.965261102Z self.sock = sock = self._new_conn()
2025-03-04T09:21:33.965315102Z ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965341822Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 207, in _new_conn
2025-03-04T09:21:33.965369262Z raise ConnectTimeoutError(
2025-03-04T09:21:33.965400662Z urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')
2025-03-04T09:21:33.965427382Z
2025-03-04T09:21:33.965454102Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965482222Z
2025-03-04T09:21:33.965511502Z urllib3.exceptions.ProxyError: ('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)'))
2025-03-04T09:21:33.965539582Z
2025-03-04T09:21:33.965570262Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965597662Z
2025-03-04T09:21:33.965624342Z Traceback (most recent call last):
2025-03-04T09:21:33.965652022Z File "/home/user/.local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
2025-03-04T09:21:33.965678742Z resp = conn.urlopen(
2025-03-04T09:21:33.965705462Z ^^^^^^^^^^^^^
2025-03-04T09:21:33.965736502Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
2025-03-04T09:21:33.965764542Z retries = retries.increment(
2025-03-04T09:21:33.965791262Z ^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965818702Z File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/retry.py", line 519, in increment
2025-03-04T09:21:33.965845462Z raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
2025-03-04T09:21:33.965872182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965903182Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Max retries exceeded with url: /xet-bridge-us/634929bd8146350b3a4cadaf/e78778928a1863786d5bb22a109a7ff1dbac47a29eae6f223a1fc2689172c347?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250304%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250304T092822Z&X-Amz-Expires=3600&X-Amz-Signature=ec3d88ee3232911a63b42dc9c0e34dce9fadcb63c00c77fd4449848658036c0c&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&x-id=GetObject&Expires=1741084102&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTA4NDEwMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82MzQ5MjliZDgxNDYzNTBiM2E0Y2FkYWYvZTc4Nzc4OTI4YTE4NjM3ODZkNWJiMjJhMTA5YTdmZjFkYmFjNDdhMjllYWU2ZjIyM2ExZmMyNjg5MTcyYzM0NyoifV19&Signature=lZTJGPchgCpyfOdZZ5w6KunAbNRkWWuC3dlQZcC75kLWRuy1HFjcLO-f8Dt7jGvlIgXQr3VSQI0QdxEzVIA-IUG9GQ8IbQcZ55f9gEZ1WzOqE8aWQOW0qdiohLAbVawxauHeEszlRJDhR6XBakCR~mpkarJBLB8GZaxYNP7JZMw5K7ZD9CwetE9KU~ABvEHKSSosv2h6AjO2aMzxscgI4fh5SNCiUmsoeUOFMChre-8OynOEE5ZLpjsfGRGGLwaduVGQgZ8T8JivLiOJl8-G0~KxL5lp849UOF9jQjzkp4SdCdohTSq-LtFFjBTmblPRXCSGAYpg~Wu977X6m8ipyA__&Key-Pair-Id=K2L8F4GPSG1IFC (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')))
2025-03-04T09:21:33.965958062Z
2025-03-04T09:21:33.965985462Z During handling of the above exception, another exception occurred:
2025-03-04T09:21:33.966012182Z
2025-03-04T09:21:33.966038902Z Traceback (most recent call last):
2025-03-04T09:21:33.966069942Z File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 70, in initialize
2025-03-04T09:21:33.966098062Z self.model = TableTransformerForObjectDetection.from_pretrained(model)
2025-03-04T09:21:33.966124782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966152902Z File "/home/user/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3776, in from_pretrained
2025-03-04T09:21:33.966179622Z resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)
2025-03-04T09:21:33.966207582Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966303262Z File "/home/user/.local/lib/python3.11/site-packages/transformers/utils/hub.py", line 403, in cached_file
2025-03-04T09:21:33.966332862Z resolved_file = hf_hub_download(
2025-03-04T09:21:33.966359582Z ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966387102Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-03-04T09:21:33.966414902Z return fn(*args, **kwargs)
2025-03-04T09:21:33.966441622Z ^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966468742Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 860, in hf_hub_download
2025-03-04T09:21:33.966495462Z return _hf_hub_download_to_cache_dir(
2025-03-04T09:21:33.966525182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966553302Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1009, in _hf_hub_download_to_cache_dir
2025-03-04T09:21:33.966582502Z _download_to_tmp_and_move(
2025-03-04T09:21:33.966610582Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1543, in _download_to_tmp_and_move
2025-03-04T09:21:33.966638142Z http_get(
2025-03-04T09:21:33.966664862Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 369, in http_get
2025-03-04T09:21:33.966692142Z r = _request_wrapper(
2025-03-04T09:21:33.966718862Z ^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966745582Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 301, in _request_wrapper
2025-03-04T09:21:33.966773742Z response = get_session().request(method=method, url=url, **params)
2025-03-04T09:21:33.966804502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966831222Z File "/home/user/.local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
2025-03-04T09:21:33.966858502Z resp = self.send(prep, **send_kwargs)
2025-03-04T09:21:33.966885222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966911942Z File "/home/user/.local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
2025-03-04T09:21:33.966940022Z r = adapter.send(request, **kwargs)
2025-03-04T09:21:33.966970662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966997582Z File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 93, in send
2025-03-04T09:21:33.967024302Z return super().send(request, *args, **kwargs)
2025-03-04T09:21:33.967052342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967081742Z File "/home/user/.local/lib/python3.11/site-packages/requests/adapters.py", line 694, in send
2025-03-04T09:21:33.967108462Z raise ProxyError(e, request=request)
2025-03-04T09:21:33.967139342Z requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Max retries exceeded with url: /xet-bridge-us/634929bd8146350b3a4cadaf/e78778928a1863786d5bb22a109a7ff1dbac47a29eae6f223a1fc2689172c347?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250304%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250304T092822Z&X-Amz-Expires=3600&X-Amz-Signature=ec3d88ee3232911a63b42dc9c0e34dce9fadcb63c00c77fd4449848658036c0c&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&x-id=GetObject&Expires=1741084102&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTA4NDEwMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82MzQ5MjliZDgxNDYzNTBiM2E0Y2FkYWYvZTc4Nzc4OTI4YTE4NjM3ODZkNWJiMjJhMTA5YTdmZjFkYmFjNDdhMjllYWU2ZjIyM2ExZmMyNjg5MTcyYzM0NyoifV19&Signature=lZTJGPchgCpyfOdZZ5w6KunAbNRkWWuC3dlQZcC75kLWRuy1HFjcLO-f8Dt7jGvlIgXQr3VSQI0QdxEzVIA-IUG9GQ8IbQcZ55f9gEZ1WzOqE8aWQOW0qdiohLAbVawxauHeEszlRJDhR6XBakCR~mpkarJBLB8GZaxYNP7JZMw5K7ZD9CwetE9KU~ABvEHKSSosv2h6AjO2aMzxscgI4fh5SNCiUmsoeUOFMChre-8OynOEE5ZLpjsfGRGGLwaduVGQgZ8T8JivLiOJl8-G0~KxL5lp849UOF9jQjzkp4SdCdohTSq-LtFFjBTmblPRXCSGAYpg~Wu977X6m8ipyA__&Key-Pair-Id=K2L8F4GPSG1IFC (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')))"), '(Request ID: 33d6998c-af8a-410d-81d9-a9681b9f5d03)')
2025-03-04T09:21:33.967180022Z
2025-03-04T09:21:33.967206742Z During handling of the above exception, another exception occurred:
2025-03-04T09:21:33.967236302Z
2025-03-04T09:21:33.967263022Z Traceback (most recent call last):
2025-03-04T09:21:33.967291782Z File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
2025-03-04T09:21:33.967318462Z result = await app( # type: ignore[func-returns-value]
2025-03-04T09:21:33.967347862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967374582Z File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
2025-03-04T09:21:33.967402142Z return await self.app(scope, receive, send)
2025-03-04T09:21:33.967428862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967455582Z File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
2025-03-04T09:21:33.967482302Z await super().__call__(scope, receive, send)
2025-03-04T09:21:33.967512582Z File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 112, in __call__
2025-03-04T09:21:33.967539342Z await self.middleware_stack(scope, receive, send)
2025-03-04T09:21:33.967566022Z File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
2025-03-04T09:21:33.967592742Z raise exc
2025-03-04T09:21:33.967621902Z File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
2025-03-04T09:21:33.967648662Z await self.app(scope, receive, _send)
2025-03-04T09:21:33.967678942Z File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
2025-03-04T09:21:33.967705662Z raise exc
2025-03-04T09:21:33.967732382Z File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
2025-03-04T09:21:33.967759062Z await self.app(scope, receive, send_wrapper)
2025-03-04T09:21:33.967785822Z File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
2025-03-04T09:21:33.967812542Z await self.app(scope, receive, send)
2025-03-04T09:21:33.967843102Z File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
2025-03-04T09:21:33.967869822Z await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2025-03-04T09:21:33.967896582Z File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2025-03-04T09:21:33.967923262Z raise exc
2025-03-04T09:21:33.967952582Z File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
2025-03-04T09:21:33.967979302Z await app(scope, receive, sender)
2025-03-04T09:21:33.968007942Z File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
2025-03-04T09:21:33.968034662Z await self.middleware_stack(scope, receive, send)
2025-03-04T09:21:33.968063342Z File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
2025-03-04T09:21:33.968090062Z await route.handle(scope, receive, send)
2025-03-04T09:21:33.968117662Z File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
2025-03-04T09:21:33.968144382Z await self.app(scope, receive, send)
2025-03-04T09:21:33.968171102Z File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
2025-03-04T09:21:33.968197822Z await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2025-03-04T09:21:33.968228022Z File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2025-03-04T09:21:33.968254742Z raise exc
2025-03-04T09:21:33.968281462Z File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
2025-03-04T09:21:33.968308182Z await app(scope, receive, sender)
2025-03-04T09:21:33.968337622Z File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
2025-03-04T09:21:33.968364342Z response = await f(request)
2025-03-04T09:21:33.968394582Z ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968421262Z File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
2025-03-04T09:21:33.968447982Z raw_response = await run_endpoint_function(
2025-03-04T09:21:33.968474742Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968501462Z File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-03-04T09:21:33.968528142Z return await dependant.call(**values)
2025-03-04T09:21:33.968558702Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968585422Z File "/home/user/comps/dataprep/src/opea_dataprep_microservice.py", line 67, in ingest_files
2025-03-04T09:21:33.968612302Z response = await loader.ingest_files(files, link_list, chunk_size, chunk_overlap, process_table, table_strategy)
2025-03-04T09:21:33.968639022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968668462Z File "/home/user/comps/dataprep/src/opea_dataprep_loader.py", line 23, in ingest_files
2025-03-04T09:21:33.968695182Z return await self.component.ingest_files(*args, **kwargs)
2025-03-04T09:21:33.968723822Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968750542Z File "/home/user/comps/dataprep/src/integrations/redis.py", line 379, in ingest_files
2025-03-04T09:21:33.968780022Z ingest_data_to_redis(
2025-03-04T09:21:33.968806742Z File "/home/user/comps/dataprep/src/integrations/redis.py", line 269, in ingest_data_to_redis
2025-03-04T09:21:33.968834022Z table_chunks = get_tables_result(path, doc_path.table_strategy)
2025-03-04T09:21:33.968860742Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968887462Z File "/home/user/comps/dataprep/src/utils.py", line 654, in get_tables_result
2025-03-04T09:21:33.968914182Z raw_pdf_elements = partition_pdf(
2025-03-04T09:21:33.968944422Z ^^^^^^^^^^^^^^
2025-03-04T09:21:33.968971142Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 581, in wrapper
2025-03-04T09:21:33.968997862Z elements = func(*args, **kwargs)
2025-03-04T09:21:33.969024582Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969053942Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 788, in wrapper
2025-03-04T09:21:33.969080662Z elements = func(*args, **kwargs)
2025-03-04T09:21:33.969110982Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969137702Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 746, in wrapper
2025-03-04T09:21:33.969164422Z elements = func(*args, **kwargs)
2025-03-04T09:21:33.969191142Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969217862Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
2025-03-04T09:21:33.969244582Z elements = func(*args, **kwargs)
2025-03-04T09:21:33.969275142Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969301862Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 209, in partition_pdf
2025-03-04T09:21:33.969328742Z return partition_pdf_or_image(
2025-03-04T09:21:33.969355462Z ^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969384742Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 305, in partition_pdf_or_image
2025-03-04T09:21:33.969411462Z elements = _partition_pdf_or_image_local(
2025-03-04T09:21:33.969440142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969467342Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969496862Z return func(*args, **kwargs)
2025-03-04T09:21:33.969524342Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969553102Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 626, in _partition_pdf_or_image_local
2025-03-04T09:21:33.969580902Z final_document_layout = process_file_with_ocr(
2025-03-04T09:21:33.969610062Z ^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969637902Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969670542Z return func(*args, **kwargs)
2025-03-04T09:21:33.969697262Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969724582Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 178, in process_file_with_ocr
2025-03-04T09:21:33.969751302Z raise e
2025-03-04T09:21:33.969778022Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 165, in process_file_with_ocr
2025-03-04T09:21:33.969804702Z merged_page_layout = supplement_page_layout_with_ocr(
2025-03-04T09:21:33.969834982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969861702Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969888422Z return func(*args, **kwargs)
2025-03-04T09:21:33.969915142Z ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969944582Z File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 237, in supplement_page_layout_with_ocr
2025-03-04T09:21:33.969971302Z tables.load_agent()
2025-03-04T09:21:33.970001222Z File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 142, in load_agent
2025-03-04T09:21:33.970027942Z tables_agent.initialize("microsoft/table-transformer-structure-recognition")
2025-03-04T09:21:33.970054662Z File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 77, in initialize
2025-03-04T09:21:33.970081382Z raise ImportError(
2025-03-04T09:21:33.970108102Z ImportError: Review the parameters to initialize a UnstructuredTableTransformerModel obj
Attachments
No response
@XinyuYe-Intel, please help to check this issue.
This issue is caused by network connection issue, leads to TableTransformer model download failure from huggingface.
@XinyuYe-Intel
- the host/backend download LLM model from Huggingface
- UI browser connect to UI server and upload files to data-prep.
host, container network connection is OK.
Please help to solve the issue
This issue is caused by network connection issue, leads to TableTransformer model download failure from huggingface.
Hi @xiguiw @XinyuYe-Intel , Just remind please check if possible.
Hi @XinyuYe-Intel,
[remind] Not solution yet. please help on this.
This feature is used seldom.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Hi @XinyuYe-Intel Please help handle the issue if time is allowed Set -F "process_table=true" , This ingests file reports errors.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.