dify icon indicating copy to clipboard operation
dify copied to clipboard

perf: optimizing db WorkflowAppLog index

Open horochx opened this issue 11 months ago • 6 comments

Summary

Currently, the primary slow queries in our database are originating from the get_paginate_workflow_app_logs method of the WorkflowAppService. I've noticed that there might be room for optimization in the index of the WorkflowAppLog table.

Consequently, I have added the most frequently used fields, created_at and workflow_run_id, to the index.

Close #14752

Screenshots

Before After
... ...

Checklist

[!IMPORTANT]
Please review the checklist below before submitting your pull request.

  • [ ] This change requires a documentation update, included: Dify Document
  • [x] I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • [x] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • [x] I've updated the documentation accordingly.
  • [x] I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

horochx avatar Mar 03 '25 02:03 horochx

Hello, please open an issue and link it in the description.

crazywoola avatar Mar 03 '25 06:03 crazywoola

Hello, please open an issue and link it in the description.

I've created issue #14752 and attached the relevant SQL statements.

I was wondering if adding a partial index on WorkflowRun.status=failed might further optimize query performance. However, I wasn't certain whether this use case was generally applicable or just specific to my situation, so I didn't pursue it further.

horochx avatar Mar 03 '25 07:03 horochx

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

bowenliang123 avatar Mar 03 '25 10:03 bowenliang123

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

Thank you for your suggestion. The reason I haven't run flask db migrate yet is because when I tested it, I noticed there are still some pending model migrations in the main branch. I'm unsure if this is the intended behavior within Dify's development workflow, so I've left it as is for now 😂.

horochx avatar Mar 04 '25 04:03 horochx

Can you provide a performance testing process for large-scale datasets? I would like to understand how significant the impact of this improvement can be. Thank you.

laipz8200 avatar Mar 05 '25 07:03 laipz8200

Can you provide a performance testing process for large-scale datasets? I would like to understand how significant the impact of this improvement can be. Thank you.

I'm afraid I cannot provide performance testing data from a production environment at this time. The issue is that in our test environment, logs are centralized on specific apps. This setup results in a less noticeable filtering effect after indexing (the number of matched logs only reduced from 2,512,370 to 2,447,919). So, I can't give you reliable real-world performance results. I can only say this change should be theoretically effective 😂.

horochx avatar Mar 10 '25 03:03 horochx