Optimize implementation of SchemaView `get_classes_by_slot()` method
The get_classes_by_slot() method in SchemaView takes an extremely long time to run on the MIxS schema and generate the Applicable Classes table on slot documentation pages because of which we are having to explore ways to optimize the runtime for the get_classes_by_slot() method.
Codecov Report
Attention: 9 lines in your changes are missing coverage. Please review.
Comparison is base (
b86c1fa) 62.11% compared to head (954c207) 62.10%. Report is 6 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #281 +/- ##
==========================================
- Coverage 62.11% 62.10% -0.01%
==========================================
Files 63 63
Lines 8459 8463 +4
Branches 2169 2170 +1
==========================================
+ Hits 5254 5256 +2
Misses 2599 2599
- Partials 606 608 +2
| Files | Coverage Δ | |
|---|---|---|
| linkml_runtime/utils/schema_as_dict.py | 91.30% <100.00%> (ø) |
|
| linkml_runtime/utils/schemaview.py | 87.81% <100.00%> (+0.02%) |
:arrow_up: |
| linkml_runtime/utils/namespaces.py | 72.51% <43.75%> (-0.86%) |
:arrow_down: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
We did some profiling (coincidentally :D) of docgen and found that the deepcopy calls for large schemas in schemaview were the most time-consuming bit.
https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1201
https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1209
Oh, that's very good to know, thank you for digging into this @sierra-moxon 😁
good sleuthing! we should definitely avoid use of deepcopy when calculating induced slots
but note that induced slots may not be necessary for docgen purposes.
- In some cases, we only want to show asserted usages. Otherwise we may end up flooding the user (e.g. in biolink, every named thing has
idas an induced slot) - however, there is also an argument that it's useful to see the slot propagated down to all subclasses. But even here some simple logic is required, no need to to the full inference
changing https://github.com/linkml/linkml/blob/598376ce7f8c11bd3cf31f0ca7e3d5c34770021a/linkml/generators/docgen/slot.md.jinja2#L26 from True to False in my custom template (and other instances in this file) took docgen on biolink down from >2 hours to just under a minute.
see also linkml issues #1214 and #1604
@sujaypatil96 is this replaced by #300?