When setting ETL_TYPE to Unstructured, the local docker runs and reports an error when creating the knowledge base.
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
dify-api:0.6.15
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Local docker run, set environment variables .env
ETL_TYPE=Unstructured
UNSTRUCTURED_API_URL=https://api.unstructuredapp.io/general/v0/general
UNSTRUCTURED_API_KEY=*****
The above error occurs when creating the knowledge base and starting the setup.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Hello @biofer! 👋 I'm here to help you with any bugs, questions, or contributions you may have regarding the repository. Let me know how I can assist you!
To resolve the error indicating that the NLTK 'punkt' resource is not found when setting ETL_TYPE to Unstructured in your local Docker instance, you need to download the 'punkt' resource. You can do this by adding the following code to your Dockerfile or by running it in your Docker container:
import nltk
nltk.download('punkt')
If you are using a Dockerfile, you can add the following lines to ensure the 'punkt' resource is downloaded during the build process:
RUN python -m nltk.downloader punkt
This will ensure that the 'punkt' tokenizer is available when you run your application in the Docker container [1].
To continue talking to Dosu, mention @dosu.
I followed the official tutorial to start docker, if there is something wrong with the docker image, why don't you provide the latest docker image?
This is a tough nut to crack, could really use your help on this, @takatost.
To continue talking to Dosu, mention @dosu.
Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.
import nltk
nltk.download('punkt')
I would think that since the official docs give instructions, shouldn't they be followed and shouldn't it just work silky smooth? Or is the experience of the domestic user unimportant and untested?
He is currently on vacation. Here is a similar issue raised long time ago. https://github.com/langgenius/dify/issues/4659
Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.
import nltk nltk.download('punkt')
So hopefully I can get a mirror image with no problems.