MONAI OS Error : broken data stream when reading image file

Describe the bug 2D Images of .png format cannot always be successfully loaded due to OS/Linux based instability. The particular issue has also been observed here: https://discuss.pytorch.org/t/images-not-read-properly-anymore-after-an-epoch-of-successful-training/92586

Please note that the fix is:

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

To Reproduce I can share the exact image on which I faced the issue, but it can happen anytime to any image and it might not occur on a different machine, because its not understood extremely well.

Expected behavior The expectation is that the image should load, irrespective

Screenshots Errorlog is provided below

Environment

Package                   Version
------------------------- -----------
absl-py                   1.2.0
alembic                   1.8.1
attrs                     22.1.0
beautifulsoup4            4.11.1
cachetools                5.2.0
certifi                   2022.6.15
charset-normalizer        2.1.0
click                     8.1.3
cloudpickle               2.1.0
cucim                     22.6.0
cycler                    0.11.0
databricks-cli            0.17.0
docker                    5.0.3
einops                    0.4.1
entrypoints               0.4
filelock                  3.7.1
fire                      0.4.0
Flask                     2.2.2
fonttools                 4.34.4
gdown                     4.5.1
gitdb                     4.0.9
GitPython                 3.1.27
google-auth               2.10.0
google-auth-oauthlib      0.4.6
greenlet                  1.1.2
grpcio                    1.47.0
gunicorn                  20.1.0
h5py                      3.7.0
huggingface-hub           0.8.1
idna                      3.3
imagecodecs               2022.8.8
imageio                   2.21.1
importlib-metadata        4.12.0
importlib-resources       5.9.0
itk                       5.2.1.post1
itk-core                  5.2.1.post1
itk-filtering             5.2.1.post1
itk-io                    5.2.1.post1
itk-numerics              5.2.1.post1
itk-registration          5.2.1.post1
itk-segmentation          5.2.1.post1
itsdangerous              2.1.2
Jinja2                    3.1.2
joblib                    1.1.0
jsonschema                4.9.1
kiwisolver                1.4.4
lmdb                      1.3.0
Mako                      1.2.1
Markdown                  3.4.1
MarkupSafe                2.1.1
matplotlib                3.5.2
mlflow                    1.27.0
monai                     0.9.1
networkx                  2.8.5
nibabel                   4.0.1
numpy                     1.23.1
oauthlib                  3.2.0
openslide-python          1.1.2
packaging                 21.3
pandas                    1.4.3
pickle5                   0.0.11
Pillow                    9.2.0
pip                       22.1.2
pkgutil_resolve_name      1.3.10
prometheus-client         0.14.1
prometheus-flask-exporter 0.20.3
protobuf                  3.19.4
psutil                    5.9.1
pyasn1                    0.4.8
pyasn1-modules            0.2.8
pydicom                   2.3.0
PyJWT                     2.4.0
pynrrd                    0.4.3
pyparsing                 3.0.9
pyrsistent                0.18.1
PySocks                   1.7.1
python-dateutil           2.8.2
pytorch-ignite            0.4.9
pytz                      2022.1
PyWavelets                1.3.0
PyYAML                    6.0
querystring-parser        1.2.4
regex                     2022.7.25
requests                  2.28.1
requests-oauthlib         1.3.1
rsa                       4.9
scikit-image              0.19.3
scikit-learn              1.1.2
scipy                     1.9.0
setuptools                61.2.0
six                       1.16.0
smmap                     5.0.0
soupsieve                 2.3.2.post1
SQLAlchemy                1.4.40
sqlparse                  0.4.2
tabulate                  0.8.10
tensorboard               2.9.1
tensorboard-data-server   0.6.1
tensorboard-plugin-wit    1.8.1
tensorboardX              2.5.1
termcolor                 1.1.0
threadpoolctl             3.1.0
tifffile                  2022.8.8
timm                      0.6.7
tokenizers                0.12.1
torch                     1.12.1
torchvision               0.13.1
tqdm                      4.64.0
transformers              4.21.1
typing_extensions         4.3.0
urllib3                   1.26.11
websocket-client          1.3.3
Werkzeug                  2.2.2
wheel                     0.37.1
zipp                      3.8.1

Ensuring you use the relevant python executable, please paste the output of:

python -c 'import monai; monai.config.print_debug_info()'

Additional context Add any other context about the problem here.

Error Log:

Traceback (most recent call last):
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 90, in apply_transform
    return _apply_transform(transform, data, unpack_items)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 54, in _apply_transform
    return transform(parameters)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/io/dictionary.py", line 133, in __call__
    data = self._loader(d[key], reader)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/io/array.py", line 253, in __call__
    img_array, meta_data = reader.get_data(img)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/image_reader.py", line 1182, in get_data
    data = np.moveaxis(np.asarray(i), 0, 1)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/Image.py", line 687, in __array_interface__
    new["data"] = self.tobytes()
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/Image.py", line 729, in tobytes
    self.load()
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/ImageFile.py", line 276, in load
    raise_oserror(err_code)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/ImageFile.py", line 71, in raise_oserror
    raise OSError(message + " when reading image file")
OSError: broken data stream when reading image file

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 90, in apply_transform
    return _apply_transform(transform, data, unpack_items)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 54, in _apply_transform
    return transform(parameters)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/compose.py", line 173, in __call__
    input_ = apply_transform(_transform, input_, self.map_items, self.unpack_items, self.log_stats)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 114, in apply_transform
    raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.io.dictionary.LoadImaged object at 0x7feaa2db66d0>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vishwesh/Code/rxrx1_gitlab/rxrx1/rxrx1_testing.py", line 212, in <module>
    main(args)
  File "/home/vishwesh/Code/rxrx1_gitlab/rxrx1/rxrx1_testing.py", line 143, in main
    for batch_data in dataset_loader:
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/dataset.py", line 97, in __getitem__
    return self._transform(index)
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/dataset.py", line 83, in _transform
    return apply_transform(self.transform, data_i) if self.transform is not None else data_i
  File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 114, in apply_transform
    raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x7feaa2dbc550>

Aug 22 '22 17:08 finalelement

Thanks for raising the issue. I think we may add an option to the PILReader? Like load_truncated_images: bool = False? And then support it with:

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

@wyli @KumoLiu What do you think?

Thanks in advance.

Aug 23 '22 05:08 Nic-Ma

looks like there's a side effect of changing the content in the original post https://discuss.pytorch.org/t/images-not-read-properly-anymore-after-an-epoch-of-successful-training/92586

the image then contains many zeros at the end (these zeros are not part of the original image).

could you please provide a minimal example to reproduce the issue? @finalelement

Aug 23 '22 08:08 wyli

I can share the image offline, where it happens on my Ubuntu machine, but then again it's not necessary that it might happen at your end. I've shared the image via slack

Aug 23 '22 17:08 finalelement

To reproduce the issue use the below snippet:

import os
from monai.transforms import (
    LoadImage
)

def main():
    img_path = os.path.normpath('/put/image/path/here')

    load_img = LoadImage()
    img_d = load_img(img_path)

    print('Debug here')

if __name__=="__main__":
    main()

My observation is that the particular image in question could be read using the fix, from the given post. However, the image is not read in the right order, that will need more investigation I think.

Overall, it might still be worthwhile to provide this option to user, because a single data sample can break the flow of entire training/testing. In the particular case I am dealing with, this is one 2D image out of 2,64,000 images.

The outcome of this image does not matter that much, it could also be considered that a warning can be shown that this image was read incorrectly. Although it should be acknowledged, if it's a warning, it will mostly be neglected.

Aug 23 '22 17:08 finalelement

inactive close @finalelement please reopen if still not resolved.

Jan 05 '24 14:01 vikashg

OS Error : broken data stream when reading image file | 2D Image not being read from PIL