Summary

Currently, the primary slow queries in our database are originating from the get_paginate_workflow_app_logs method of the WorkflowAppService. I've noticed that there might be room for optimization in the index of the WorkflowAppLog table.

Consequently, I have added the most frequently used fields, created_at and workflow_run_id, to the index.

Close #14752

Screenshots

Before	After
...	...

Checklist

[!IMPORTANT]
Please review the checklist below before submitting your pull request.

[ ] This change requires a documentation update, included: Dify Document
[x] I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
[x] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
[x] I've updated the documentation accordingly.
[x] I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

Mar 03 '25 02:03 horochx

Hello, please open an issue and link it in the description.

Mar 03 '25 06:03 crazywoola

Hello, please open an issue and link it in the description.

I've created issue #14752 and attached the relevant SQL statements.

I was wondering if adding a partial index on WorkflowRun.status=failed might further optimize query performance. However, I wasn't certain whether this use case was generally applicable or just specific to my situation, so I didn't pursue it further.

Mar 03 '25 07:03 horochx

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

Mar 03 '25 10:03 bowenliang123

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

Thank you for your suggestion. The reason I haven't run flask db migrate yet is because when I tested it, I noticed there are still some pending model migrations in the main branch. I'm unsure if this is the intended behavior within Dify's development workflow, so I've left it as is for now 😂.

Mar 04 '25 04:03 horochx

Can you provide a performance testing process for large-scale datasets? I would like to understand how significant the impact of this improvement can be. Thank you.

Mar 05 '25 07:03 laipz8200

Can you provide a performance testing process for large-scale datasets? I would like to understand how significant the impact of this improvement can be. Thank you.

I'm afraid I cannot provide performance testing data from a production environment at this time. The issue is that in our test environment, logs are centralized on specific apps. This setup results in a less noticeable filtering effect after indexing (the number of matched logs only reduced from 2,512,370 to 2,447,919). So, I can't give you reliable real-world performance results. I can only say this change should be theoretically effective 😂.

Mar 10 '25 03:03 horochx

perf: optimizing db WorkflowAppLog index

Summary

Screenshots

Checklist