dify icon indicating copy to clipboard operation
dify copied to clipboard

knowledge API

Open nicho2 opened this issue 1 year ago • 4 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Provide a description of requested docs changes

Hello,

There is an error in knowledge API

The query parameter "keyword" with Get Chunks from a Document is not "keyword" but "keywords" with a "s"

ex : http://10.2.2.77/v1/datasets/adb7f789-d1b3-4a66-b890-8e416388f066/documents/3a9b9e6c-b25c-49f4-aedf-3c059ac52d64/segments?keywords=05c9ce31-99a3-43f7-b7bb-7813b3ff17a5

Image

Regards

nicho2 avatar Apr 03 '25 08:04 nicho2

The correct query parameter for the "Get Chunks from a Document" endpoint in the knowledge API is "keyword" without an "s" [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Apr 03 '25 08:04 dosubot[bot]

https://github.com/langgenius/dify/blob/360986f38d5b168723beaf952aef118bd5954ae6/api/controllers/service_api/dataset/segment.py#L127

From the implementation, it is keyword.

crazywoola avatar Apr 03 '25 09:04 crazywoola

hi @nicho2 @crazywoola seems this is invalid

in backend api is defined here in api/controllers/console/datasets/datasets_segments.py around line 647 api.add_resource(DatasetDocumentSegmentListApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments")

if we go around line 75, it is expecting keyword and not keywords in args keyword = args["keyword"]

I tested also and filter doesn't work if we send keywords but works when we send keyword in query params

sharing a sample curl request to try

curl --location 'http://localhost:5001/console/api/datasets/486dc944-a76d-4a6b-bfed-2109e8c2d6f3/documents/69c83f96-b120-4df3-adc5-7eb3764409f9/segments?page=1&limit=10&keyword=jub&enabled=all' \ --header 'Accept: */*' \ --header 'Accept-Language: en-GB,en;q=0.9' \ --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiZjUxOTlkYjMtNmJhMC00MjdmLWEwMGQtZWJjYmJhMjIwYzgzIiwiZXhwIjoxNzQ0MTgwNTM1LCJpc3MiOiJTRUxGX0hPU1RFRCIsInN1YiI6IkNvbnNvbGUgQVBJIFBhc3Nwb3J0In0.HeC8ETqtyalfcjMYnf7VD9uRrmsObs0oyYDHD4DC61U' \ --header 'Cache-Control: no-cache' \ --header 'Connection: keep-alive' \ --header 'Origin: http://localhost:3000' \ --header 'Pragma: no-cache' \ --header 'Referer: http://localhost:3000/' \ --header 'Sec-Fetch-Dest: empty' \ --header 'Sec-Fetch-Mode: cors' \ --header 'Sec-Fetch-Site: same-site' \ --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36' \ --header 'content-type: application/json' \ --header 'sec-ch-ua: "Island";v="135", "Not-A.Brand";v="8", "Chromium";v="135"' \ --header 'sec-ch-ua-mobile: ?0' \ --header 'sec-ch-ua-platform: "macOS"' \ --header 'Cookie: locale=en-US'

jubinsoni avatar Apr 09 '25 06:04 jubinsoni

Hello @jubinsoni , @crazywoola

when i send with "keywords": (i have the chunk)

Image

when i send with "keyword" (no chunk)

Image

nicho2 avatar Apr 11 '25 12:04 nicho2

hi @nicho2

in your screenshot, instead of sending uuid(05c9ce-..-7a5) in keyword argument ... can you send Hobbies, try again and share screenshots

also if you don't send any arguments output will be the same as when you are sending keywords as an argument ... ie you getting returned all the segments

explanation

if we go to api/controllers/console/datasets/datasets_segments.py around line number 88, code is written to do search over content field only when keyword is sent in api params

if keyword:
            query = query.where(DocumentSegment.content.ilike(f"%{keyword}%"))

when you are sending a uuid(05c9ce-..-7a5) in keyword, none of the segments content matched with this and you are getting empty result which is correct

jubinsoni avatar Apr 13 '25 16:04 jubinsoni