spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Numpy v2.0.0 breaks the ability to download models using spaCy

Open afogel opened this issue 1 year ago • 13 comments

How to reproduce the behaviour

In my dockerfile, I run these commands:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install spacy

RUN python -m spacy download en_core_web_lg

It returns the following error (and stacktrace):

2.519 Traceback (most recent call last):
2.519   File "<frozen runpy>", line 189, in _run_module_as_main
2.519   File "<frozen runpy>", line 148, in _get_module_details
2.519   File "<frozen runpy>", line 112, in _get_module_details
2.519   File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module>
2.521     from .errors import setup_default_warnings
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module>
2.522     from .compat import Literal
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module>
2.522     from thinc.api import Optimizer  # noqa: F401
2.522     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module>
2.522     from .backends import (
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module>
2.522     from .cupy_ops import CupyOps
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
2.522     from .numpy_ops import NumpyOps
2.522   File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Locking to the previous version of numpy will resolve this issue:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install numpy==1.26.4 spacy

RUN python -m spacy download en_core_web_lg

afogel avatar Jun 16 '24 15:06 afogel

+1

gborodin avatar Jun 17 '24 10:06 gborodin

How to reproduce the behaviour

In my dockerfile, I run these commands:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install spacy

RUN python -m spacy download en_core_web_lg

It returns the following error (and stacktrace):

2.519 Traceback (most recent call last):
2.519   File "<frozen runpy>", line 189, in _run_module_as_main
2.519   File "<frozen runpy>", line 148, in _get_module_details
2.519   File "<frozen runpy>", line 112, in _get_module_details
2.519   File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module>
2.521     from .errors import setup_default_warnings
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module>
2.522     from .compat import Literal
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module>
2.522     from thinc.api import Optimizer  # noqa: F401
2.522     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module>
2.522     from .backends import (
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module>
2.522     from .cupy_ops import CupyOps
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
2.522     from .numpy_ops import NumpyOps
2.522   File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Locking to the previous version of numpy will resolve this issue:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install numpy==1.26.4 spacy

RUN python -m spacy download en_core_web_lg

this solution helped, thank you

rustammdev avatar Jun 17 '24 19:06 rustammdev

+1 I also had this problem. Thanks for posting the solution 👍

supert56 avatar Jun 18 '24 09:06 supert56

Those solutions indeed works, but I would still like to see a fix in the codebase itself. This issue is that inside the requirements.txt of the project (just an assumption after a short look at the codebase), the version is specified as such:

numpy>=1.15.0; python_version < "3.9"
numpy>=1.19.0; python_version >= "3.9"

I am a huge fan, in all of my projects, of always pinning dependencies even up to the patch version.

I would suggest a PR that looks like this:

numpy>=1.15.0,<2.0.0; python_version < "3.9"
numpy>=1.19.0,<2.0.0; python_version >= "3.9"

This at least pins the version down to major releases, which should anyway always be the case, as major version can (and most likely will always) contain breaking changes.

nachthammer avatar Jun 18 '24 13:06 nachthammer

@DoctorManhattan123 To clarify, the solution I posted is only meant to be a stopgap.

Ideally, all downstream consumers of numpy (including library maintainers) should complete the migration to leverage numpy 2.0.0. I imagine, given the size of the release, that this will take time.

The pinned version is to tide over people seeking to quickly fix their CI/CD or whatever impacted process is broken until a more robust solution is implemented in the affected codebases.

afogel avatar Jun 18 '24 14:06 afogel

This issue with thinc has been noted https://github.com/explosion/thinc/issues/939

bendennescma avatar Jun 18 '24 14:06 bendennescma

It helped. Thanks!

lucas-mdsena avatar Jul 16 '24 12:07 lucas-mdsena

The new release 3.7.6 should resolve this :)

cyriaka90 avatar Aug 30 '24 09:08 cyriaka90

I'm still experiencing the same error on 3.7.6 and numpy 2.1 && 2.0.0. As a sanity check it works after downgrading to 1.26.4

ddayan avatar Sep 01 '24 20:09 ddayan

The issue still persists with the 3.7.6-release as it still depends on thinc<8.3, which is incompatible with numpy>=2.0

CptCaptain avatar Sep 04 '24 08:09 CptCaptain

The issue still persists with the 3.7.6-release as it still depends on thinc<8.3, which is incompatible with numpy>=2.0

Yes it appears thinc v8.3.0 itself is the first release that is compatible with numpy>=2.0

The latest release before that (v8.2.5) specifically restricts numpy pin to <2.0.0

bendennescma avatar Sep 04 '24 08:09 bendennescma

See also #13607

filbranden avatar Sep 05 '24 18:09 filbranden

Sorry for the delay on this.

I want to release the upgraded numpy pin as version 3.8, because I don't want to drop support for Python 3.8 in a patch release. Upgrading to numpy v2 in a patch release is also questionable.

However, the model artifacts bake in the version of spaCy into the package. This means I need to retrain the models to do the v3.8 release, and the retraining is taking some time.

honnibal avatar Sep 16 '24 10:09 honnibal

@honnibal I think this was resolved by release 3.8.2, right? If so, can we close?

afogel avatar Oct 29 '24 14:10 afogel

@afogel still happens to me on 3.8.2

yovelcohen avatar Nov 13 '24 13:11 yovelcohen

@yovelcohen so it looks like you need to explicitly lock to the latest thinc version in order to resolve the dependencies using poetry lock.

right now, my pyproject.toml looks like this:

[tool.poetry.dependencies]
python = "3.12.5"
...
spacy = "3.8.2"
thinc = "8.3.3"

afogel avatar Dec 29 '24 11:12 afogel

This should be resolved now. Thanks for your patience.

honnibal avatar May 22 '25 12:05 honnibal

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Jun 22 '25 00:06 github-actions[bot]