dify icon indicating copy to clipboard operation
dify copied to clipboard

Notion syncing only some pages

Open BURG3R5 opened this issue 1 year ago • 15 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Followed the typical steps to connect Dify to Notion using an internal integration.

✔️ Expected Behavior

We have 1228 pages in our Notion database. All of them should be indexed into the Knowledge Base.

❌ Actual Behavior

The selection screen shows 125 pages selected and upon initial sync completion, only 107 documents are indexed.

Screenshot 2024-10-04 at 3 03 11 PM Screenshot 2024-10-04 at 3 01 15 PM

BURG3R5 avatar Oct 04 '24 09:10 BURG3R5

Hey @BURG3R5! 👋 I'm here to help you with any bugs, questions, or contributions you have for Dify. I'm a bot designed to assist you while waiting for a human maintainer. Let's tackle this issue together!

I found a similar issue that was previously reported and closed: [Cloud] Missing added pages when trying to add a new page to Knowledge Base with Notion integration. This issue was resolved by a pull request that fixed the problem with adding Notion pages to the knowledge base [1].

To troubleshoot and resolve your issue with incomplete syncing of Notion pages in Dify, you can follow these steps:

  1. Authorization Verification: Ensure that the authorization verification is completed successfully. This involves binding the Notion content to Dify either during dataset creation or through the settings under Data Sources.

  2. Check Integration Configuration: Verify that the integration with Notion is correctly configured. Ensure that the environment variables in the .env file are set correctly for either internal or public integration.

  3. Segmentation and Cleaning: Confirm that the segmentation settings and indexing method are correctly configured. This affects how data is processed and imported.

  4. Synchronization Process: If the Notion content is modified, ensure that the synchronization is triggered by clicking Sync in the Dify dataset Document List Page.

  5. Data Type Limitations: Be aware that images and files are not supported for import, and tabular data will be converted to text display, which might affect the completeness of the sync.

  6. Automatic Sync Setup: Ensure that the automatic sync is set up correctly so that any updates in Notion are automatically reflected in Dify.

If these steps do not resolve the issue, further investigation into the specific configuration and logs may be necessary. Additionally, ensure that you are following the specific steps for internal or public integration as outlined in the Dify documentation [2][3][4].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 04 '24 09:10 dosubot[bot]

@dosu The mentioned issue is not really similar to mine. I need help from a human maintainer.

BURG3R5 avatar Oct 04 '24 09:10 BURG3R5

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 04 '24 09:10 dosubot[bot]

Additionally, I have already attempted the fixes the bot suggested, and none of them helped.

BURG3R5 avatar Oct 04 '24 09:10 BURG3R5

Is there any updates on this issue? I faced the same problem of missing pages during sync, both cloud and self hosted

JirehC avatar Oct 11 '24 05:10 JirehC

@JirehC I've moved on to downloading Notion pages as Markdown and uploading them to an empty knowledge base using the API. Can't trust Dify to respond in time or even give proper error messages, so I'm implementing the upload and sync processes myself. At least this way I have more control over the processes and can actually see what's going wrong.

Not gonna close the issue because this is still an unsolved problem in the product.

BURG3R5 avatar Oct 11 '24 06:10 BURG3R5

@BURG3R5 I've found a potential cause for this, and it's not something we can fix, it might even be a Notion Problem rather than a Dify problem. A potential problem with the syncing is the 'time-out' issue. For example, exporting Notion workspace to Anytype through integration also has the same issue as Dify where pages gone missing. Anytype explained that the file size could be too big to make it before the time-out.

I've also moved to download Notion pages as markdown but got fed up uploading them one by one because I couldn't find anything about uploading via API. Could you send me the link to the documentation on how to do it?

Edit: Is this the documentation you follow? https://docs.dify.ai/guides/knowledge-base/maintain-dataset-via-api

Screenshot 2024-10-11 at 17 07 30

JirehC avatar Oct 11 '24 08:10 JirehC

@JirehC

  1. Regarding the API: Yes, I am using that alongside the notion-to-md library.
  2. Regarding the issue: Extending the timeout on the server-side and using exponential back-off to avoid rate-limits should be simple enough for Dify maintainers. Notion's "retrieve block children" API is paginated, so even if there are many blocks inside the document, setting the page_size parameter to be small enough should resolve the timeout issues.

I haven't explored the Dify repository myself to figure out exactly where the issue is, since this is a massive mono-repo containing everything from frontend to client SDKs to backend and so on. I wish a maintainer would provide insights on what is the code that's failing.

BURG3R5 avatar Oct 11 '24 09:10 BURG3R5

Thanks @BURG3R5. I haven't discovered that library, this is great. Thanks for sharing!

JirehC avatar Oct 11 '24 09:10 JirehC

@crazywoola @JohnJyong

Yawen-1010 avatar Oct 22 '24 09:10 Yawen-1010

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
  • Troubleshooting steps I took did not resolve the issue.
  • Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
  • You have moved to manually handling Notion pages and suggested improvements for Dify.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Nov 22 '24 16:11 dosubot[bot]

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
  • Troubleshooting steps I took did not resolve the issue.
  • Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
  • You have moved to manually handling Notion pages and suggested improvements for Dify.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

This issue is not solved.

JirehC avatar Nov 22 '24 16:11 JirehC

I got this problem too. I met { "code": "indexing_estimate_error", "message": "notion page type not supported", "status": 500 }

when sync a page.

xy3xy3 avatar Dec 08 '24 05:12 xy3xy3

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • You reported a bug in Dify version 0.9.1 where only a small fraction of Notion pages sync to the Knowledge Base.
  • Attempts to resolve the issue using suggested troubleshooting steps were unsuccessful.
  • You have been manually handling Notion pages as a workaround.
  • Another user, JirehC, suggested the issue might be related to a "time-out" problem with Notion's API.
  • The issue remains unresolved, and you have requested further insights from a maintainer.

Next Steps:

  • Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Jan 08 '25 16:01 dosubot[bot]

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

* You reported a bug in Dify version 0.9.1 where only a small fraction of Notion pages sync to the Knowledge Base.

* Attempts to resolve the issue using suggested troubleshooting steps were unsuccessful.

* You have been manually handling Notion pages as a workaround.

* Another user, JirehC, suggested the issue might be related to a "time-out" problem with Notion's API.

* The issue remains unresolved, and you have requested further insights from a maintainer.

Next Steps:

* Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting here.

* If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

This issue is not resolved.

BURG3R5 avatar Jan 08 '25 17:01 BURG3R5

This issue is not resolved.

Are you still running the v0.9.1?

crazywoola avatar Jan 09 '25 04:01 crazywoola

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary

  • Reported bug in Dify version 0.9.1: only 107 out of 1228 Notion pages sync to the Knowledge Base.
  • Troubleshooting steps I suggested did not resolve the issue.
  • User JirehC suggested a potential "time-out" issue with Notion's API.
  • Both users are manually handling Notion pages as a workaround.
  • You requested further insights from a maintainer, but the issue remains unresolved.

Next Steps

  • Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Feb 09 '25 16:02 dosubot[bot]