GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

[Bug]Data-prep ingest file reports errors if process table

Open xiguiw opened this issue 10 months ago • 6 comments

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • [x] Pull docker images from hub.docker.com
  • [ ] Build docker images from source
  • [ ] Other

Deploy method

  • [ ] Docker
  • [x] Docker Compose
  • [ ] Kubernetes Helm Charts
  • [ ] Kubernetes GMC
  • [ ] Other

Running nodes

Single Node

What's the version?

V1.2

data-prep ingest file correctly without process tables.

When set process tables, there is errors reported.

Description

This ingests file successfully.

curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"  \

Set -F "process_table=true" This ingests file reports errors.

curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"  \
     -F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"

Reproduce steps

start chatqna docker compose up -d

This ingests file successfully.

curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"  \

Set -F "process_table=true" This ingests file reports errors.

curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"  \
     -F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"

Raw log

curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"  \
     -F "chunk_size=1000" -F "chunk_overlap=100" -F "process_table=true" -F "table_strategy=hq"


The error logs are:

2025-03-04T09:17:20.184165142Z [2025-03-04 09:17:20,183] [    INFO] - redis_dataprep - [ redis ingest] File nke-10k-2023.pdf does not exist.
2025-03-04T09:17:24.255837822Z /home/user/comps/dataprep/src/integrations/redis.py:194: LangChainDeprecationWarning: The class `HuggingFaceBgeEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFaceEmbeddings``.
2025-03-04T09:17:24.255945662Z   embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
2025-03-04T09:18:49.242946102Z INFO:     10.112.238.202:39888 - "POST /v1/dataprep/ingest HTTP/1.1" 200 OK
2025-03-04T09:18:57.009354062Z INFO:     10.112.238.202:43050 - "POST /v1/dataprep/get HTTP/1.1" 200 OK
2025-03-04T09:19:11.786637862Z [2025-03-04 09:19:11,785] [    INFO] - redis_dataprep - [ redis delete ] doc id: file:nke-10k-2023.pdf
2025-03-04T09:19:12.486472862Z INFO:     10.112.238.202:49032 - "POST /v1/dataprep/delete HTTP/1.1" 200 OK
2025-03-04T09:19:19.770719062Z INFO:     10.112.238.202:37702 - "POST /v1/dataprep/get HTTP/1.1" 200 OK
2025-03-04T09:19:50.701691142Z [2025-03-04 09:19:50,700] [    INFO] - redis_dataprep - [ redis ingest] File nke-10k-2023.pdf does not exist.
2025-03-04T09:21:33.633908062Z Failed to initialize the model.
2025-03-04T09:21:33.633998942Z Ensure that the model is correct
2025-03-04T09:21:33.932197862Z [2025-03-04 09:21:33,931] [   ERROR] - opea_dataprep_microservice - Error during dataprep ingest invocation: Review the parameters to initialize a UnstructuredTableTransformerModel obj
2025-03-04T09:21:33.940831822Z INFO:     10.112.238.202:52138 - "POST /v1/dataprep/ingest HTTP/1.1" 500 Internal Server Error
2025-03-04T09:21:33.964683022Z ERROR:    Exception in ASGI application
2025-03-04T09:21:33.964751182Z Traceback (most recent call last):
2025-03-04T09:21:33.964778982Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
2025-03-04T09:21:33.964807022Z     sock = connection.create_connection(
2025-03-04T09:21:33.964836982Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.964863662Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
2025-03-04T09:21:33.964890462Z     raise err
2025-03-04T09:21:33.964918262Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
2025-03-04T09:21:33.964946702Z     sock.connect(sa)
2025-03-04T09:21:33.964975342Z TimeoutError: timed out
2025-03-04T09:21:33.965003622Z
2025-03-04T09:21:33.965033142Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965064582Z
2025-03-04T09:21:33.965091982Z Traceback (most recent call last):
2025-03-04T09:21:33.965120862Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 773, in urlopen
2025-03-04T09:21:33.965148262Z     self._prepare_proxy(conn)
2025-03-04T09:21:33.965174982Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1042, in _prepare_proxy
2025-03-04T09:21:33.965203702Z     conn.connect()
2025-03-04T09:21:33.965234382Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 704, in connect
2025-03-04T09:21:33.965261102Z     self.sock = sock = self._new_conn()
2025-03-04T09:21:33.965315102Z                        ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965341822Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connection.py", line 207, in _new_conn
2025-03-04T09:21:33.965369262Z     raise ConnectTimeoutError(
2025-03-04T09:21:33.965400662Z urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')
2025-03-04T09:21:33.965427382Z
2025-03-04T09:21:33.965454102Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965482222Z
2025-03-04T09:21:33.965511502Z urllib3.exceptions.ProxyError: ('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)'))
2025-03-04T09:21:33.965539582Z
2025-03-04T09:21:33.965570262Z The above exception was the direct cause of the following exception:
2025-03-04T09:21:33.965597662Z
2025-03-04T09:21:33.965624342Z Traceback (most recent call last):
2025-03-04T09:21:33.965652022Z   File "/home/user/.local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
2025-03-04T09:21:33.965678742Z     resp = conn.urlopen(
2025-03-04T09:21:33.965705462Z            ^^^^^^^^^^^^^
2025-03-04T09:21:33.965736502Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
2025-03-04T09:21:33.965764542Z     retries = retries.increment(
2025-03-04T09:21:33.965791262Z               ^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965818702Z   File "/home/user/.local/lib/python3.11/site-packages/urllib3/util/retry.py", line 519, in increment
2025-03-04T09:21:33.965845462Z     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
2025-03-04T09:21:33.965872182Z     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.965903182Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Max retries exceeded with url: /xet-bridge-us/634929bd8146350b3a4cadaf/e78778928a1863786d5bb22a109a7ff1dbac47a29eae6f223a1fc2689172c347?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250304%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250304T092822Z&X-Amz-Expires=3600&X-Amz-Signature=ec3d88ee3232911a63b42dc9c0e34dce9fadcb63c00c77fd4449848658036c0c&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&x-id=GetObject&Expires=1741084102&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTA4NDEwMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82MzQ5MjliZDgxNDYzNTBiM2E0Y2FkYWYvZTc4Nzc4OTI4YTE4NjM3ODZkNWJiMjJhMTA5YTdmZjFkYmFjNDdhMjllYWU2ZjIyM2ExZmMyNjg5MTcyYzM0NyoifV19&Signature=lZTJGPchgCpyfOdZZ5w6KunAbNRkWWuC3dlQZcC75kLWRuy1HFjcLO-f8Dt7jGvlIgXQr3VSQI0QdxEzVIA-IUG9GQ8IbQcZ55f9gEZ1WzOqE8aWQOW0qdiohLAbVawxauHeEszlRJDhR6XBakCR~mpkarJBLB8GZaxYNP7JZMw5K7ZD9CwetE9KU~ABvEHKSSosv2h6AjO2aMzxscgI4fh5SNCiUmsoeUOFMChre-8OynOEE5ZLpjsfGRGGLwaduVGQgZ8T8JivLiOJl8-G0~KxL5lp849UOF9jQjzkp4SdCdohTSq-LtFFjBTmblPRXCSGAYpg~Wu977X6m8ipyA__&Key-Pair-Id=K2L8F4GPSG1IFC (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')))
2025-03-04T09:21:33.965958062Z
2025-03-04T09:21:33.965985462Z During handling of the above exception, another exception occurred:
2025-03-04T09:21:33.966012182Z
2025-03-04T09:21:33.966038902Z Traceback (most recent call last):
2025-03-04T09:21:33.966069942Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 70, in initialize
2025-03-04T09:21:33.966098062Z     self.model = TableTransformerForObjectDetection.from_pretrained(model)
2025-03-04T09:21:33.966124782Z                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966152902Z   File "/home/user/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3776, in from_pretrained
2025-03-04T09:21:33.966179622Z     resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)
2025-03-04T09:21:33.966207582Z                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966303262Z   File "/home/user/.local/lib/python3.11/site-packages/transformers/utils/hub.py", line 403, in cached_file
2025-03-04T09:21:33.966332862Z     resolved_file = hf_hub_download(
2025-03-04T09:21:33.966359582Z                     ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966387102Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-03-04T09:21:33.966414902Z     return fn(*args, **kwargs)
2025-03-04T09:21:33.966441622Z            ^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966468742Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 860, in hf_hub_download
2025-03-04T09:21:33.966495462Z     return _hf_hub_download_to_cache_dir(
2025-03-04T09:21:33.966525182Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966553302Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1009, in _hf_hub_download_to_cache_dir
2025-03-04T09:21:33.966582502Z     _download_to_tmp_and_move(
2025-03-04T09:21:33.966610582Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1543, in _download_to_tmp_and_move
2025-03-04T09:21:33.966638142Z     http_get(
2025-03-04T09:21:33.966664862Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 369, in http_get
2025-03-04T09:21:33.966692142Z     r = _request_wrapper(
2025-03-04T09:21:33.966718862Z         ^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966745582Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 301, in _request_wrapper
2025-03-04T09:21:33.966773742Z     response = get_session().request(method=method, url=url, **params)
2025-03-04T09:21:33.966804502Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966831222Z   File "/home/user/.local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
2025-03-04T09:21:33.966858502Z     resp = self.send(prep, **send_kwargs)
2025-03-04T09:21:33.966885222Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966911942Z   File "/home/user/.local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
2025-03-04T09:21:33.966940022Z     r = adapter.send(request, **kwargs)
2025-03-04T09:21:33.966970662Z         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.966997582Z   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 93, in send
2025-03-04T09:21:33.967024302Z     return super().send(request, *args, **kwargs)
2025-03-04T09:21:33.967052342Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967081742Z   File "/home/user/.local/lib/python3.11/site-packages/requests/adapters.py", line 694, in send
2025-03-04T09:21:33.967108462Z     raise ProxyError(e, request=request)
2025-03-04T09:21:33.967139342Z requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Max retries exceeded with url: /xet-bridge-us/634929bd8146350b3a4cadaf/e78778928a1863786d5bb22a109a7ff1dbac47a29eae6f223a1fc2689172c347?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250304%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250304T092822Z&X-Amz-Expires=3600&X-Amz-Signature=ec3d88ee3232911a63b42dc9c0e34dce9fadcb63c00c77fd4449848658036c0c&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&x-id=GetObject&Expires=1741084102&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTA4NDEwMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82MzQ5MjliZDgxNDYzNTBiM2E0Y2FkYWYvZTc4Nzc4OTI4YTE4NjM3ODZkNWJiMjJhMTA5YTdmZjFkYmFjNDdhMjllYWU2ZjIyM2ExZmMyNjg5MTcyYzM0NyoifV19&Signature=lZTJGPchgCpyfOdZZ5w6KunAbNRkWWuC3dlQZcC75kLWRuy1HFjcLO-f8Dt7jGvlIgXQr3VSQI0QdxEzVIA-IUG9GQ8IbQcZ55f9gEZ1WzOqE8aWQOW0qdiohLAbVawxauHeEszlRJDhR6XBakCR~mpkarJBLB8GZaxYNP7JZMw5K7ZD9CwetE9KU~ABvEHKSSosv2h6AjO2aMzxscgI4fh5SNCiUmsoeUOFMChre-8OynOEE5ZLpjsfGRGGLwaduVGQgZ8T8JivLiOJl8-G0~KxL5lp849UOF9jQjzkp4SdCdohTSq-LtFFjBTmblPRXCSGAYpg~Wu977X6m8ipyA__&Key-Pair-Id=K2L8F4GPSG1IFC (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f09f592dcd0>, 'Connection to proxy.ims.intel.com timed out. (connect timeout=10)')))"), '(Request ID: 33d6998c-af8a-410d-81d9-a9681b9f5d03)')
2025-03-04T09:21:33.967180022Z
2025-03-04T09:21:33.967206742Z During handling of the above exception, another exception occurred:
2025-03-04T09:21:33.967236302Z
2025-03-04T09:21:33.967263022Z Traceback (most recent call last):
2025-03-04T09:21:33.967291782Z   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
2025-03-04T09:21:33.967318462Z     result = await app(  # type: ignore[func-returns-value]
2025-03-04T09:21:33.967347862Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967374582Z   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
2025-03-04T09:21:33.967402142Z     return await self.app(scope, receive, send)
2025-03-04T09:21:33.967428862Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.967455582Z   File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
2025-03-04T09:21:33.967482302Z     await super().__call__(scope, receive, send)
2025-03-04T09:21:33.967512582Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 112, in __call__
2025-03-04T09:21:33.967539342Z     await self.middleware_stack(scope, receive, send)
2025-03-04T09:21:33.967566022Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
2025-03-04T09:21:33.967592742Z     raise exc
2025-03-04T09:21:33.967621902Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
2025-03-04T09:21:33.967648662Z     await self.app(scope, receive, _send)
2025-03-04T09:21:33.967678942Z   File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
2025-03-04T09:21:33.967705662Z     raise exc
2025-03-04T09:21:33.967732382Z   File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
2025-03-04T09:21:33.967759062Z     await self.app(scope, receive, send_wrapper)
2025-03-04T09:21:33.967785822Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
2025-03-04T09:21:33.967812542Z     await self.app(scope, receive, send)
2025-03-04T09:21:33.967843102Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
2025-03-04T09:21:33.967869822Z     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2025-03-04T09:21:33.967896582Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2025-03-04T09:21:33.967923262Z     raise exc
2025-03-04T09:21:33.967952582Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
2025-03-04T09:21:33.967979302Z     await app(scope, receive, sender)
2025-03-04T09:21:33.968007942Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
2025-03-04T09:21:33.968034662Z     await self.middleware_stack(scope, receive, send)
2025-03-04T09:21:33.968063342Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
2025-03-04T09:21:33.968090062Z     await route.handle(scope, receive, send)
2025-03-04T09:21:33.968117662Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
2025-03-04T09:21:33.968144382Z     await self.app(scope, receive, send)
2025-03-04T09:21:33.968171102Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
2025-03-04T09:21:33.968197822Z     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2025-03-04T09:21:33.968228022Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2025-03-04T09:21:33.968254742Z     raise exc
2025-03-04T09:21:33.968281462Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
2025-03-04T09:21:33.968308182Z     await app(scope, receive, sender)
2025-03-04T09:21:33.968337622Z   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
2025-03-04T09:21:33.968364342Z     response = await f(request)
2025-03-04T09:21:33.968394582Z                ^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968421262Z   File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
2025-03-04T09:21:33.968447982Z     raw_response = await run_endpoint_function(
2025-03-04T09:21:33.968474742Z                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968501462Z   File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-03-04T09:21:33.968528142Z     return await dependant.call(**values)
2025-03-04T09:21:33.968558702Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968585422Z   File "/home/user/comps/dataprep/src/opea_dataprep_microservice.py", line 67, in ingest_files
2025-03-04T09:21:33.968612302Z     response = await loader.ingest_files(files, link_list, chunk_size, chunk_overlap, process_table, table_strategy)
2025-03-04T09:21:33.968639022Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968668462Z   File "/home/user/comps/dataprep/src/opea_dataprep_loader.py", line 23, in ingest_files
2025-03-04T09:21:33.968695182Z     return await self.component.ingest_files(*args, **kwargs)
2025-03-04T09:21:33.968723822Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968750542Z   File "/home/user/comps/dataprep/src/integrations/redis.py", line 379, in ingest_files
2025-03-04T09:21:33.968780022Z     ingest_data_to_redis(
2025-03-04T09:21:33.968806742Z   File "/home/user/comps/dataprep/src/integrations/redis.py", line 269, in ingest_data_to_redis
2025-03-04T09:21:33.968834022Z     table_chunks = get_tables_result(path, doc_path.table_strategy)
2025-03-04T09:21:33.968860742Z                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.968887462Z   File "/home/user/comps/dataprep/src/utils.py", line 654, in get_tables_result
2025-03-04T09:21:33.968914182Z     raw_pdf_elements = partition_pdf(
2025-03-04T09:21:33.968944422Z                        ^^^^^^^^^^^^^^
2025-03-04T09:21:33.968971142Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 581, in wrapper
2025-03-04T09:21:33.968997862Z     elements = func(*args, **kwargs)
2025-03-04T09:21:33.969024582Z                ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969053942Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 788, in wrapper
2025-03-04T09:21:33.969080662Z     elements = func(*args, **kwargs)
2025-03-04T09:21:33.969110982Z                ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969137702Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 746, in wrapper
2025-03-04T09:21:33.969164422Z     elements = func(*args, **kwargs)
2025-03-04T09:21:33.969191142Z                ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969217862Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
2025-03-04T09:21:33.969244582Z     elements = func(*args, **kwargs)
2025-03-04T09:21:33.969275142Z                ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969301862Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 209, in partition_pdf
2025-03-04T09:21:33.969328742Z     return partition_pdf_or_image(
2025-03-04T09:21:33.969355462Z            ^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969384742Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 305, in partition_pdf_or_image
2025-03-04T09:21:33.969411462Z     elements = _partition_pdf_or_image_local(
2025-03-04T09:21:33.969440142Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969467342Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969496862Z     return func(*args, **kwargs)
2025-03-04T09:21:33.969524342Z            ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969553102Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 626, in _partition_pdf_or_image_local
2025-03-04T09:21:33.969580902Z     final_document_layout = process_file_with_ocr(
2025-03-04T09:21:33.969610062Z                             ^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969637902Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969670542Z     return func(*args, **kwargs)
2025-03-04T09:21:33.969697262Z            ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969724582Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 178, in process_file_with_ocr
2025-03-04T09:21:33.969751302Z     raise e
2025-03-04T09:21:33.969778022Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 165, in process_file_with_ocr
2025-03-04T09:21:33.969804702Z     merged_page_layout = supplement_page_layout_with_ocr(
2025-03-04T09:21:33.969834982Z                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969861702Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 216, in wrapper
2025-03-04T09:21:33.969888422Z     return func(*args, **kwargs)
2025-03-04T09:21:33.969915142Z            ^^^^^^^^^^^^^^^^^^^^^
2025-03-04T09:21:33.969944582Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 237, in supplement_page_layout_with_ocr
2025-03-04T09:21:33.969971302Z     tables.load_agent()
2025-03-04T09:21:33.970001222Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 142, in load_agent
2025-03-04T09:21:33.970027942Z     tables_agent.initialize("microsoft/table-transformer-structure-recognition")
2025-03-04T09:21:33.970054662Z   File "/home/user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 77, in initialize
2025-03-04T09:21:33.970081382Z     raise ImportError(
2025-03-04T09:21:33.970108102Z ImportError: Review the parameters to initialize a UnstructuredTableTransformerModel obj

Attachments

No response

xiguiw avatar Mar 04 '25 09:03 xiguiw

@XinyuYe-Intel, please help to check this issue.

lvliang-intel avatar Mar 05 '25 01:03 lvliang-intel

This issue is caused by network connection issue, leads to TableTransformer model download failure from huggingface.

XinyuYe-Intel avatar Mar 05 '25 01:03 XinyuYe-Intel

@XinyuYe-Intel

  1. the host/backend download LLM model from Huggingface
  2. UI browser connect to UI server and upload files to data-prep.

host, container network connection is OK.

Please help to solve the issue

This issue is caused by network connection issue, leads to TableTransformer model download failure from huggingface.

xiguiw avatar Mar 10 '25 07:03 xiguiw

Hi @xiguiw @XinyuYe-Intel , Just remind please check if possible.

yinghu5 avatar Mar 26 '25 07:03 yinghu5

Hi @XinyuYe-Intel,

[remind] Not solution yet. please help on this.

This feature is used seldom.

xiguiw avatar Apr 16 '25 09:04 xiguiw

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

CICD-at-OPEA avatar May 16 '25 22:05 CICD-at-OPEA

Hi @XinyuYe-Intel Please help handle the issue if time is allowed Set -F "process_table=true" , This ingests file reports errors.

yinghu5 avatar May 27 '25 06:05 yinghu5

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

CICD-at-OPEA avatar Jun 26 '25 22:06 CICD-at-OPEA

This issue was closed because it has been stalled for 7 days with no activity.

CICD-at-OPEA avatar Jul 04 '25 22:07 CICD-at-OPEA