ChatQnA data prep failed to upload .doc file
I bring up the ChatQnA pipeline with data prep on Xeon and then i tried to upload a plain .docx file without image. The upload failed and response with such error:
files:UploadFile(filename='Understanding Cloud Computing.doc', size=39424, headers=Headers({'content-disposition': 'form-data; name="files"; filename="Understanding Cloud Computing.doc"', 'content-type': 'application/msword'})) link_list:None Parsing document ./uploaded_files/Understanding Cloud Computing.doc. Converting doc file to docx file... Converted doc file to docx file. INFO: 127.0.0.1:52028 - "POST /v1/dataprep HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in call raise exc File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in call await self.app(scope, receive, send_wrapper) File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 93, in call await self.simple_response(scope, receive, send, request_headers=headers) File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 148, in simple_response await self.app(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 461, in async_wrapper raise e File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 450, in async_wrapper function_result = await asyncio.create_task( # type: ignore[call-arg] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 141, in ingest_documents ingest_data_to_redis( File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 53, in ingest_data_to_redis content = document_loader(path) ^^^^^^^^^^^^^^^^^^^^^ File "/home/user/comps/dataprep/utils.py", line 290, in document_loader return load_doc(doc_path) ^^^^^^^^^^^^^^^^^^ File "/home/user/comps/dataprep/utils.py", line 142, in load_doc text = load_docx(docx_path) ^^^^^^^^^^^^^^^^^^^^ File "/home/user/comps/dataprep/utils.py", line 149, in load_docx doc = docx.Document(docx_path) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/docx/api.py", line 27, in Document document_part = cast("DocumentPart", Package.open(docx).main_document_part) ^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/docx/opc/package.py", line 127, in open pkg_reader = PackageReader.from_file(pkg_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/docx/opc/pkgreader.py", line 22, in from_file phys_reader = PhysPkgReader(pkg_file) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/docx/opc/phys_pkg.py", line 21, in new raise PackageNotFoundError("Package not found at '%s'" % pkg_file) docx.opc.exceptions.PackageNotFoundError: Package not found at './uploaded_files/Understanding Cloud Computing.docx'
Though this is different with issue opea-project/GenAIComps#407, but it could be related too. @Ruoyu-y do you mind uploading your test docx file here?
Sure. Please find the document here. Understanding Cloud Computing.docx
Hi @Ruoyu-y , Understanding Cloud Computing.docx you shared can't be opened by word, and I also got below error message which makes sense to me
I created a new docx file (test_hy.docx in below picture) removing all unrecognized characters from Understanding Cloud Computing.docx, and it can be uploaded successfully
Please try with the latest code and a normal .docx file that can be opened by word, and let us know if you still face issues.
Hi, sorry for the confusion. I am originally uploading the .doc file. Due to github attach restrictions, i manually changed the .doc to .docx. And now, i can still find the same problem using the new code. A new .doc file is attached in zip. Please check. Understanding Cloud Computing.zip
Hi @Ruoyu-y , the issue is due to lacking of libreoffice in container when uploading .doc files. Please try https://github.com/opea-project/GenAIComps/pull/542
Thanks