dify icon indicating copy to clipboard operation
dify copied to clipboard

When setting ETL_TYPE to Unstructured, the local docker runs and reports an error when creating the knowledge base.

Open biofer opened this issue 1 year ago • 7 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

dify-api:0.6.15

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

image

Local docker run, set environment variables .env

ETL_TYPE=Unstructured
UNSTRUCTURED_API_URL=https://api.unstructuredapp.io/general/v0/general
UNSTRUCTURED_API_KEY=*****

The above error occurs when creating the knowledge base and starting the setup.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

biofer avatar Aug 01 '24 05:08 biofer

Hello @biofer! 👋 I'm here to help you with any bugs, questions, or contributions you may have regarding the repository. Let me know how I can assist you!

To resolve the error indicating that the NLTK 'punkt' resource is not found when setting ETL_TYPE to Unstructured in your local Docker instance, you need to download the 'punkt' resource. You can do this by adding the following code to your Dockerfile or by running it in your Docker container:

import nltk
nltk.download('punkt')

If you are using a Dockerfile, you can add the following lines to ensure the 'punkt' resource is downloaded during the build process:

RUN python -m nltk.downloader punkt

This will ensure that the 'punkt' tokenizer is available when you run your application in the Docker container [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Aug 01 '24 05:08 dosubot[bot]

I followed the official tutorial to start docker, if there is something wrong with the docker image, why don't you provide the latest docker image?

biofer avatar Aug 01 '24 05:08 biofer

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Aug 01 '24 05:08 dosubot[bot]

Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.

import nltk
nltk.download('punkt')

biofer avatar Aug 02 '24 00:08 biofer

I would think that since the official docs give instructions, shouldn't they be followed and shouldn't it just work silky smooth? Or is the experience of the domestic user unimportant and untested?

biofer avatar Aug 02 '24 00:08 biofer

He is currently on vacation. Here is a similar issue raised long time ago. https://github.com/langgenius/dify/issues/4659

crazywoola avatar Aug 02 '24 07:08 crazywoola

Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.

import nltk
nltk.download('punkt')

So hopefully I can get a mirror image with no problems.

biofer avatar Aug 02 '24 07:08 biofer