superset icon indicating copy to clipboard operation
superset copied to clipboard

perf(export): export generates unnecessary files content

Open Always-prog opened this issue 2 years ago • 5 comments

SUMMARY

Currently, exporting models functionality is slower than it can be.

Model export is slow because models export their related models, and since related models cannot check if they have already been exported to the final archive, they are exported anyway. This leads to generating unnecessary export model files. For example, exporting a dashboard with 2 charts will generate 4 files - a dashboard, two charts, a dataset, and a database, but ““behind the scenes” there will actually be 7 files generated, since each chart will re-export the dataset and database, and at the end, the highest-level export model (dashboard export) will remove these duplicates.

To solve this problem, it is proposed to start generating export files not immediately, but only when they are written to a file, by convert file_content to function that generates content when we call it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

pay attention to speed up of generating an dashboard export archive!

Before

https://github.com/apache/superset/assets/66589759/76551088-5d87-4c4d-b89d-fcc4c66f1b27

After

https://github.com/apache/superset/assets/66589759/4489b671-6c17-4f9a-a955-e5b285f5e5c4

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

Always-prog avatar Jan 23 '24 20:01 Always-prog

Hello @Always-prog thanks for the PR! Would you mind taking a look at the failing CI steps?

geido avatar Jan 25 '24 18:01 geido

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (4796484) 67.18% compared to head (9d2e16e) 69.51%.

Files Patch % Lines
superset/commands/dashboard/export.py 59.25% 11 Missing :warning:
superset/commands/export/models.py 80.00% 2 Missing :warning:
superset/commands/export/assets.py 96.15% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26765      +/-   ##
==========================================
+ Coverage   67.18%   69.51%   +2.33%     
==========================================
  Files        1900     1900              
  Lines       74443    74491      +48     
  Branches     8293     8293              
==========================================
+ Hits        50012    51785    +1773     
+ Misses      22376    20651    -1725     
  Partials     2055     2055              
Flag Coverage Δ
hive 53.82% <46.08%> (?)
mysql 78.04% <78.26%> (+0.01%) :arrow_up:
postgres 78.14% <78.26%> (+0.01%) :arrow_up:
presto 53.77% <46.08%> (?)
python 83.09% <87.82%> (+4.83%) :arrow_up:
sqlite 77.66% <78.26%> (+0.01%) :arrow_up:
unit 56.49% <60.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 26 '24 09:01 codecov[bot]

@betodealmeida Hi! Thanks for review! I have committed fixes by your comments.

But the only thing that pre-commit fails due to file .github/workflows/update-monorepo-lockfiles.yml, which I have not fixed.

Always-prog avatar Jan 31 '24 09:01 Always-prog

Hi @Always-prog it seems you have a conflicting file. Can you fix it, please? I am adding myself as a reviewer and checking back on this asap. Thank you!

geido avatar Feb 07 '24 17:02 geido

👀

geido avatar Feb 12 '24 17:02 geido

@geido Hi! Rebased!

Always-prog avatar Feb 13 '24 13:02 Always-prog

/testenv up

geido avatar Feb 19 '24 14:02 geido

@geido Ephemeral environment spinning up at http://54.191.84.78:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

github-actions[bot] avatar Feb 19 '24 14:02 github-actions[bot]

@michael-s-molina Thank you for testing my PR. Can I get merge?

Always-prog avatar Feb 20 '24 20:02 Always-prog

Ephemeral environment shutdown and build artifacts deleted.

github-actions[bot] avatar Feb 21 '24 23:02 github-actions[bot]