knowledge API
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Provide a description of requested docs changes
Hello,
There is an error in knowledge API
The query parameter "keyword" with Get Chunks from a Document is not "keyword" but "keywords" with a "s"
ex : http://10.2.2.77/v1/datasets/adb7f789-d1b3-4a66-b890-8e416388f066/documents/3a9b9e6c-b25c-49f4-aedf-3c059ac52d64/segments?keywords=05c9ce31-99a3-43f7-b7bb-7813b3ff17a5
Regards
The correct query parameter for the "Get Chunks from a Document" endpoint in the knowledge API is "keyword" without an "s" [1].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
https://github.com/langgenius/dify/blob/360986f38d5b168723beaf952aef118bd5954ae6/api/controllers/service_api/dataset/segment.py#L127
From the implementation, it is keyword.
hi @nicho2 @crazywoola seems this is invalid
in backend api is defined here in api/controllers/console/datasets/datasets_segments.py around line 647
api.add_resource(DatasetDocumentSegmentListApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/segments")
if we go around line 75, it is expecting keyword and not keywords in args
keyword = args["keyword"]
I tested also and filter doesn't work if we send keywords but works when we send keyword in query params
sharing a sample curl request to try
curl --location 'http://localhost:5001/console/api/datasets/486dc944-a76d-4a6b-bfed-2109e8c2d6f3/documents/69c83f96-b120-4df3-adc5-7eb3764409f9/segments?page=1&limit=10&keyword=jub&enabled=all' \ --header 'Accept: */*' \ --header 'Accept-Language: en-GB,en;q=0.9' \ --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiZjUxOTlkYjMtNmJhMC00MjdmLWEwMGQtZWJjYmJhMjIwYzgzIiwiZXhwIjoxNzQ0MTgwNTM1LCJpc3MiOiJTRUxGX0hPU1RFRCIsInN1YiI6IkNvbnNvbGUgQVBJIFBhc3Nwb3J0In0.HeC8ETqtyalfcjMYnf7VD9uRrmsObs0oyYDHD4DC61U' \ --header 'Cache-Control: no-cache' \ --header 'Connection: keep-alive' \ --header 'Origin: http://localhost:3000' \ --header 'Pragma: no-cache' \ --header 'Referer: http://localhost:3000/' \ --header 'Sec-Fetch-Dest: empty' \ --header 'Sec-Fetch-Mode: cors' \ --header 'Sec-Fetch-Site: same-site' \ --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36' \ --header 'content-type: application/json' \ --header 'sec-ch-ua: "Island";v="135", "Not-A.Brand";v="8", "Chromium";v="135"' \ --header 'sec-ch-ua-mobile: ?0' \ --header 'sec-ch-ua-platform: "macOS"' \ --header 'Cookie: locale=en-US'
Hello @jubinsoni , @crazywoola
when i send with "keywords": (i have the chunk)
when i send with "keyword" (no chunk)
hi @nicho2
in your screenshot, instead of sending uuid(05c9ce-..-7a5) in keyword argument ... can you send Hobbies, try again and share screenshots
also if you don't send any arguments output will be the same as when you are sending keywords as an argument ... ie you getting returned all the segments
explanation
if we go to api/controllers/console/datasets/datasets_segments.py around line number 88, code is written to do search over content field only when keyword is sent in api params
if keyword:
query = query.where(DocumentSegment.content.ilike(f"%{keyword}%"))
when you are sending a uuid(05c9ce-..-7a5) in keyword, none of the segments content matched with this and you are getting empty result which is correct