linkml-runtime icon indicating copy to clipboard operation
linkml-runtime copied to clipboard

Optimize implementation of SchemaView `get_classes_by_slot()` method

Open sujaypatil96 opened this issue 2 years ago • 7 comments

The get_classes_by_slot() method in SchemaView takes an extremely long time to run on the MIxS schema and generate the Applicable Classes table on slot documentation pages because of which we are having to explore ways to optimize the runtime for the get_classes_by_slot() method.

sujaypatil96 avatar Oct 31 '23 23:10 sujaypatil96

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (b86c1fa) 62.11% compared to head (954c207) 62.10%. Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
- Coverage   62.11%   62.10%   -0.01%     
==========================================
  Files          63       63              
  Lines        8459     8463       +4     
  Branches     2169     2170       +1     
==========================================
+ Hits         5254     5256       +2     
  Misses       2599     2599              
- Partials      606      608       +2     
Files Coverage Δ
linkml_runtime/utils/schema_as_dict.py 91.30% <100.00%> (ø)
linkml_runtime/utils/schemaview.py 87.81% <100.00%> (+0.02%) :arrow_up:
linkml_runtime/utils/namespaces.py 72.51% <43.75%> (-0.86%) :arrow_down:

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 31 '23 23:10 codecov[bot]

We did some profiling (coincidentally :D) of docgen and found that the deepcopy calls for large schemas in schemaview were the most time-consuming bit. https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1201 https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1209

sierra-moxon avatar Nov 01 '23 15:11 sierra-moxon

Oh, that's very good to know, thank you for digging into this @sierra-moxon 😁

sujaypatil96 avatar Nov 02 '23 22:11 sujaypatil96

good sleuthing! we should definitely avoid use of deepcopy when calculating induced slots

but note that induced slots may not be necessary for docgen purposes.

  • In some cases, we only want to show asserted usages. Otherwise we may end up flooding the user (e.g. in biolink, every named thing has id as an induced slot)
  • however, there is also an argument that it's useful to see the slot propagated down to all subclasses. But even here some simple logic is required, no need to to the full inference

cmungall avatar Nov 06 '23 22:11 cmungall

changing https://github.com/linkml/linkml/blob/598376ce7f8c11bd3cf31f0ca7e3d5c34770021a/linkml/generators/docgen/slot.md.jinja2#L26 from True to False in my custom template (and other instances in this file) took docgen on biolink down from >2 hours to just under a minute.

sierra-moxon avatar Nov 09 '23 00:11 sierra-moxon

see also linkml issues #1214 and #1604

sierra-moxon avatar Nov 09 '23 00:11 sierra-moxon

@sujaypatil96 is this replaced by #300?

cmungall avatar Feb 21 '24 18:02 cmungall