Numpy v2.0.0 breaks the ability to download models using spaCy
How to reproduce the behaviour
In my dockerfile, I run these commands:
FROM --platform=linux/amd64 python:3.12.4
RUN pip install --upgrade pip
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install spacy
RUN python -m spacy download en_core_web_lg
It returns the following error (and stacktrace):
2.519 Traceback (most recent call last):
2.519 File "<frozen runpy>", line 189, in _run_module_as_main
2.519 File "<frozen runpy>", line 148, in _get_module_details
2.519 File "<frozen runpy>", line 112, in _get_module_details
2.519 File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module>
2.521 from .errors import setup_default_warnings
2.522 File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module>
2.522 from .compat import Literal
2.522 File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module>
2.522 from thinc.api import Optimizer # noqa: F401
2.522 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2.522 File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module>
2.522 from .backends import (
2.522 File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module>
2.522 from .cupy_ops import CupyOps
2.522 File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
2.522 from .numpy_ops import NumpyOps
2.522 File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Locking to the previous version of numpy will resolve this issue:
FROM --platform=linux/amd64 python:3.12.4
RUN pip install --upgrade pip
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install numpy==1.26.4 spacy
RUN python -m spacy download en_core_web_lg
+1
How to reproduce the behaviour
In my dockerfile, I run these commands:
FROM --platform=linux/amd64 python:3.12.4 RUN pip install --upgrade pip RUN pip install torch --index-url https://download.pytorch.org/whl/cpu RUN pip install spacy RUN python -m spacy download en_core_web_lgIt returns the following error (and stacktrace):
2.519 Traceback (most recent call last): 2.519 File "<frozen runpy>", line 189, in _run_module_as_main 2.519 File "<frozen runpy>", line 148, in _get_module_details 2.519 File "<frozen runpy>", line 112, in _get_module_details 2.519 File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module> 2.521 from .errors import setup_default_warnings 2.522 File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module> 2.522 from .compat import Literal 2.522 File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module> 2.522 from thinc.api import Optimizer # noqa: F401 2.522 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2.522 File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module> 2.522 from .backends import ( 2.522 File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module> 2.522 from .cupy_ops import CupyOps 2.522 File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module> 2.522 from .numpy_ops import NumpyOps 2.522 File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops 2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObjectLocking to the previous version of numpy will resolve this issue:
FROM --platform=linux/amd64 python:3.12.4 RUN pip install --upgrade pip RUN pip install torch --index-url https://download.pytorch.org/whl/cpu RUN pip install numpy==1.26.4 spacy RUN python -m spacy download en_core_web_lg
this solution helped, thank you
+1 I also had this problem. Thanks for posting the solution 👍
Those solutions indeed works, but I would still like to see a fix in the codebase itself. This issue is that inside the requirements.txt of the project (just an assumption after a short look at the codebase), the version is specified as such:
numpy>=1.15.0; python_version < "3.9"
numpy>=1.19.0; python_version >= "3.9"
I am a huge fan, in all of my projects, of always pinning dependencies even up to the patch version.
I would suggest a PR that looks like this:
numpy>=1.15.0,<2.0.0; python_version < "3.9"
numpy>=1.19.0,<2.0.0; python_version >= "3.9"
This at least pins the version down to major releases, which should anyway always be the case, as major version can (and most likely will always) contain breaking changes.
@DoctorManhattan123 To clarify, the solution I posted is only meant to be a stopgap.
Ideally, all downstream consumers of numpy (including library maintainers) should complete the migration to leverage numpy 2.0.0. I imagine, given the size of the release, that this will take time.
The pinned version is to tide over people seeking to quickly fix their CI/CD or whatever impacted process is broken until a more robust solution is implemented in the affected codebases.
This issue with thinc has been noted https://github.com/explosion/thinc/issues/939
It helped. Thanks!
The new release 3.7.6 should resolve this :)
I'm still experiencing the same error on 3.7.6 and numpy 2.1 && 2.0.0. As a sanity check it works after downgrading to 1.26.4
The issue still persists with the 3.7.6-release as it still depends on thinc<8.3, which is incompatible with numpy>=2.0
The issue still persists with the
3.7.6-release as it still depends onthinc<8.3, which is incompatible withnumpy>=2.0
Yes it appears thinc v8.3.0 itself is the first release that is compatible with numpy>=2.0
The latest release before that (v8.2.5) specifically restricts numpy pin to <2.0.0
See also #13607
Sorry for the delay on this.
I want to release the upgraded numpy pin as version 3.8, because I don't want to drop support for Python 3.8 in a patch release. Upgrading to numpy v2 in a patch release is also questionable.
However, the model artifacts bake in the version of spaCy into the package. This means I need to retrain the models to do the v3.8 release, and the retraining is taking some time.
@honnibal I think this was resolved by release 3.8.2, right? If so, can we close?
@afogel still happens to me on 3.8.2
@yovelcohen so it looks like you need to explicitly lock to the latest thinc version in order to resolve the dependencies using poetry lock.
right now, my pyproject.toml looks like this:
[tool.poetry.dependencies]
python = "3.12.5"
...
spacy = "3.8.2"
thinc = "8.3.3"
This should be resolved now. Thanks for your patience.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.