Notion syncing only some pages
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Followed the typical steps to connect Dify to Notion using an internal integration.
✔️ Expected Behavior
We have 1228 pages in our Notion database. All of them should be indexed into the Knowledge Base.
❌ Actual Behavior
The selection screen shows 125 pages selected and upon initial sync completion, only 107 documents are indexed.
Hey @BURG3R5! 👋 I'm here to help you with any bugs, questions, or contributions you have for Dify. I'm a bot designed to assist you while waiting for a human maintainer. Let's tackle this issue together!
I found a similar issue that was previously reported and closed: [Cloud] Missing added pages when trying to add a new page to Knowledge Base with Notion integration. This issue was resolved by a pull request that fixed the problem with adding Notion pages to the knowledge base [1].
To troubleshoot and resolve your issue with incomplete syncing of Notion pages in Dify, you can follow these steps:
-
Authorization Verification: Ensure that the authorization verification is completed successfully. This involves binding the Notion content to Dify either during dataset creation or through the settings under Data Sources.
-
Check Integration Configuration: Verify that the integration with Notion is correctly configured. Ensure that the environment variables in the
.envfile are set correctly for either internal or public integration. -
Segmentation and Cleaning: Confirm that the segmentation settings and indexing method are correctly configured. This affects how data is processed and imported.
-
Synchronization Process: If the Notion content is modified, ensure that the synchronization is triggered by clicking Sync in the Dify dataset Document List Page.
-
Data Type Limitations: Be aware that images and files are not supported for import, and tabular data will be converted to text display, which might affect the completeness of the sync.
-
Automatic Sync Setup: Ensure that the automatic sync is set up correctly so that any updates in Notion are automatically reflected in Dify.
If these steps do not resolve the issue, further investigation into the specific configuration and logs may be necessary. Additionally, ensure that you are following the specific steps for internal or public integration as outlined in the Dify documentation [2][3][4].
To continue talking to Dosu, mention @dosu.
@dosu The mentioned issue is not really similar to mine. I need help from a human maintainer.
This is a tough nut to crack, could really use your help on this, @takatost.
To continue talking to Dosu, mention @dosu.
Additionally, I have already attempted the fixes the bot suggested, and none of them helped.
Is there any updates on this issue? I faced the same problem of missing pages during sync, both cloud and self hosted
@JirehC I've moved on to downloading Notion pages as Markdown and uploading them to an empty knowledge base using the API. Can't trust Dify to respond in time or even give proper error messages, so I'm implementing the upload and sync processes myself. At least this way I have more control over the processes and can actually see what's going wrong.
Not gonna close the issue because this is still an unsolved problem in the product.
@BURG3R5 I've found a potential cause for this, and it's not something we can fix, it might even be a Notion Problem rather than a Dify problem. A potential problem with the syncing is the 'time-out' issue. For example, exporting Notion workspace to Anytype through integration also has the same issue as Dify where pages gone missing. Anytype explained that the file size could be too big to make it before the time-out.
I've also moved to download Notion pages as markdown but got fed up uploading them one by one because I couldn't find anything about uploading via API. Could you send me the link to the documentation on how to do it?
Edit: Is this the documentation you follow? https://docs.dify.ai/guides/knowledge-base/maintain-dataset-via-api
@JirehC
- Regarding the API: Yes, I am using that alongside the notion-to-md library.
- Regarding the issue: Extending the timeout on the server-side and using exponential back-off to avoid rate-limits should be simple enough for Dify maintainers. Notion's "retrieve block children" API is paginated, so even if there are many blocks inside the document, setting the
page_sizeparameter to be small enough should resolve the timeout issues.
I haven't explored the Dify repository myself to figure out exactly where the issue is, since this is a massive mono-repo containing everything from frontend to client SDKs to backend and so on. I wish a maintainer would provide insights on what is the code that's failing.
Thanks @BURG3R5. I haven't discovered that library, this is great. Thanks for sharing!
@crazywoola @JohnJyong
Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
- Troubleshooting steps I took did not resolve the issue.
- Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
- You have moved to manually handling Notion pages and suggested improvements for Dify.
Next Steps:
- Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!
Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
- Troubleshooting steps I took did not resolve the issue.
- Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
- You have moved to manually handling Notion pages and suggested improvements for Dify.
Next Steps:
- Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!
This issue is not solved.
I got this problem too. I met { "code": "indexing_estimate_error", "message": "notion page type not supported", "status": 500 }
when sync a page.
Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- You reported a bug in Dify version 0.9.1 where only a small fraction of Notion pages sync to the Knowledge Base.
- Attempts to resolve the issue using suggested troubleshooting steps were unsuccessful.
- You have been manually handling Notion pages as a workaround.
- Another user, JirehC, suggested the issue might be related to a "time-out" problem with Notion's API.
- The issue remains unresolved, and you have requested further insights from a maintainer.
Next Steps:
- Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting here.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!
Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
* You reported a bug in Dify version 0.9.1 where only a small fraction of Notion pages sync to the Knowledge Base. * Attempts to resolve the issue using suggested troubleshooting steps were unsuccessful. * You have been manually handling Notion pages as a workaround. * Another user, JirehC, suggested the issue might be related to a "time-out" problem with Notion's API. * The issue remains unresolved, and you have requested further insights from a maintainer.Next Steps:
* Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting here. * If there is no further activity, this issue will be automatically closed in 15 days.Thank you for your understanding and contribution!
This issue is not resolved.
This issue is not resolved.
Are you still running the v0.9.1?
Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary
- Reported bug in Dify version 0.9.1: only 107 out of 1228 Notion pages sync to the Knowledge Base.
- Troubleshooting steps I suggested did not resolve the issue.
- User JirehC suggested a potential "time-out" issue with Notion's API.
- Both users are manually handling Notion pages as a workaround.
- You requested further insights from a maintainer, but the issue remains unresolved.
Next Steps
- Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!