spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Lazily load cli with module __getattr__

Open hoodmane opened this issue 1 year ago • 9 comments

This is a reland of #12962. That PR broke accesses to spacy.cli:

import spacy
spacy.cli # NameError

This avoids such a problem by using module __getattr__.

hoodmane avatar Dec 20 '24 12:12 hoodmane

Well this does not work quite yet.

hoodmane avatar Dec 20 '24 13:12 hoodmane

Okay now it's better.

hoodmane avatar Dec 20 '24 13:12 hoodmane

Hi @adrianeboyd, spacy + pyodide still blocked on this and wondering if you could take a look! Thanks (and also to hoodmane)

nro-bot avatar Dec 23 '24 08:12 nro-bot

Would a comment like

Lazily load `cli` submodule (used by `info`) to avoid requiring packages such as `requests` to run SpaCy

be appropriate for __getattr__?

And for curiosity, is __all__ required for lazy loading or added as "this is good practice"? Thanks!

nro-bot avatar Jan 03 '25 16:01 nro-bot

Yes. And __all__ is good practice but it's also needed to avoid dropping cli from the things imported by from mod import *

hoodmane avatar Jan 18 '25 18:01 hoodmane

@adrianeboyd Another ping on this (as it's been two months) :)

nro-bot avatar Feb 13 '25 17:02 nro-bot

@nro-bot Sorry for the lack of attention on this. Adriane no longer works at Explosion (we're no longer venture-backed), and my time for support has been split over a lot of different things.

I'm now getting more on top of it, having just finished the support for Python 3.13.

Could you summarise a little more what the lazy loading is intending to do? Is it that you want to avoid the import of the pipeline at loading, so that you can avoid triggering stuff that doesn't work?

An issue with the lazy-loading is that spaCy heavily depends on entry-points. These end up importing everything to get decorators to run as side-effects. I've refactored this in the patch for 3.13, so we no longer rely on these import side-effects to run decorators. That said, the registration functions (which need to run to do anything much) still need to import pretty much everything. So it's not clear to me what the net impact for your use-case would be.

honnibal avatar May 22 '25 12:05 honnibal

Could you sumarise a little more what the lazy loading is intending to do? Is it that you want to avoid the import of the pipeline at loading, so that you can avoid triggering stuff that doesn't work?

Yeah that's what we're after.

it's not clear to me what the net impact for your use-case would be.

The test here works fine with this patch: https://github.com/pyodide/pyodide/blob/144c3de286ad41575b7d7fcc10438465a1e093fe/packages/spacy/test_spacy.py

hoodmane avatar May 22 '25 13:05 hoodmane

The impact here is the __getattr__ effectively deferre the importing of the submodule and whatever cascade of loading that does until cli or info is called.

This prevents the need for moving import into the code

jakob1379 avatar Jul 31 '25 12:07 jakob1379