PyTorch INTERNAL ASSERTION error when performing SVD on large datasets

Open JanisGeise opened this issue 11 months ago • 3 comments

Hi @AndreWeiner,

when performing an SVD on a large dataset, I get the error message

RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1537, please report a bug to PyTorch. linalg.svd: Argument 12 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

This seems to be a known issue and is related to:

PyTorch: issue 93275, issue 102963, issue 68291, issue 51720
SciPy: issue 5401, issue 21837

and some other.

In summary, the reason for this error is the following:

PyTorch and SciPy both rely on LAPACK, which is compiled by default using 32-bit integer support. This can be verified by running e.g. scipy.__config__.show(), which yields in my case:


Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28

the max. number which can be represented is torch.iinfo(torch.int32).max: 2147483647
if the size of the data matrix exceeds this value, the SVD can't be computed, since the size of the data_matrix is represented by an int32

To verify the lacking support of 64-bit integer (here for SciPy), one can also execute scipy.linalg.get_lapack_funcs("gesdd", ilp64=True) which should yield the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/blas.py", line 401, in getter
    value = func(names, arrays, dtype, ilp64)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/lapack.py", line 992, in get_lapack_funcs
    raise RuntimeError("LAPACK ILP64 routine requested, but Scipy "
RuntimeError: LAPACK ILP64 routine requested, but Scipy compiled only with 32-bit BLAS

Potential Workaround

Numpy on the other hand is build supporting 64-bit integer by default, which can be verified by executing numpy.__config__.show() yielding:

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27

(indicated by entry USE64BITINT).

Maybe we can replace the current implementation of the SVD class within flowtorch.data.svd with numpy.linalg.svd() to avoid this issue, or add some check of the data_matrix before computing the SVD.

Regards, Janis

Mar 13 '25 12:03 JanisGeise