OS Error : broken data stream when reading image file | 2D Image not being read from PIL
Describe the bug 2D Images of .png format cannot always be successfully loaded due to OS/Linux based instability. The particular issue has also been observed here: https://discuss.pytorch.org/t/images-not-read-properly-anymore-after-an-epoch-of-successful-training/92586
Please note that the fix is:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
To Reproduce I can share the exact image on which I faced the issue, but it can happen anytime to any image and it might not occur on a different machine, because its not understood extremely well.
Expected behavior The expectation is that the image should load, irrespective
Screenshots Errorlog is provided below
Environment
Package Version
------------------------- -----------
absl-py 1.2.0
alembic 1.8.1
attrs 22.1.0
beautifulsoup4 4.11.1
cachetools 5.2.0
certifi 2022.6.15
charset-normalizer 2.1.0
click 8.1.3
cloudpickle 2.1.0
cucim 22.6.0
cycler 0.11.0
databricks-cli 0.17.0
docker 5.0.3
einops 0.4.1
entrypoints 0.4
filelock 3.7.1
fire 0.4.0
Flask 2.2.2
fonttools 4.34.4
gdown 4.5.1
gitdb 4.0.9
GitPython 3.1.27
google-auth 2.10.0
google-auth-oauthlib 0.4.6
greenlet 1.1.2
grpcio 1.47.0
gunicorn 20.1.0
h5py 3.7.0
huggingface-hub 0.8.1
idna 3.3
imagecodecs 2022.8.8
imageio 2.21.1
importlib-metadata 4.12.0
importlib-resources 5.9.0
itk 5.2.1.post1
itk-core 5.2.1.post1
itk-filtering 5.2.1.post1
itk-io 5.2.1.post1
itk-numerics 5.2.1.post1
itk-registration 5.2.1.post1
itk-segmentation 5.2.1.post1
itsdangerous 2.1.2
Jinja2 3.1.2
joblib 1.1.0
jsonschema 4.9.1
kiwisolver 1.4.4
lmdb 1.3.0
Mako 1.2.1
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.5.2
mlflow 1.27.0
monai 0.9.1
networkx 2.8.5
nibabel 4.0.1
numpy 1.23.1
oauthlib 3.2.0
openslide-python 1.1.2
packaging 21.3
pandas 1.4.3
pickle5 0.0.11
Pillow 9.2.0
pip 22.1.2
pkgutil_resolve_name 1.3.10
prometheus-client 0.14.1
prometheus-flask-exporter 0.20.3
protobuf 3.19.4
psutil 5.9.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydicom 2.3.0
PyJWT 2.4.0
pynrrd 0.4.3
pyparsing 3.0.9
pyrsistent 0.18.1
PySocks 1.7.1
python-dateutil 2.8.2
pytorch-ignite 0.4.9
pytz 2022.1
PyWavelets 1.3.0
PyYAML 6.0
querystring-parser 1.2.4
regex 2022.7.25
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scikit-image 0.19.3
scikit-learn 1.1.2
scipy 1.9.0
setuptools 61.2.0
six 1.16.0
smmap 5.0.0
soupsieve 2.3.2.post1
SQLAlchemy 1.4.40
sqlparse 0.4.2
tabulate 0.8.10
tensorboard 2.9.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.5.1
termcolor 1.1.0
threadpoolctl 3.1.0
tifffile 2022.8.8
timm 0.6.7
tokenizers 0.12.1
torch 1.12.1
torchvision 0.13.1
tqdm 4.64.0
transformers 4.21.1
typing_extensions 4.3.0
urllib3 1.26.11
websocket-client 1.3.3
Werkzeug 2.2.2
wheel 0.37.1
zipp 3.8.1
Ensuring you use the relevant python executable, please paste the output of:
python -c 'import monai; monai.config.print_debug_info()'
Additional context Add any other context about the problem here.
Error Log:
Traceback (most recent call last):
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 90, in apply_transform
return _apply_transform(transform, data, unpack_items)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 54, in _apply_transform
return transform(parameters)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/io/dictionary.py", line 133, in __call__
data = self._loader(d[key], reader)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/io/array.py", line 253, in __call__
img_array, meta_data = reader.get_data(img)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/image_reader.py", line 1182, in get_data
data = np.moveaxis(np.asarray(i), 0, 1)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/Image.py", line 687, in __array_interface__
new["data"] = self.tobytes()
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/Image.py", line 729, in tobytes
self.load()
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/ImageFile.py", line 276, in load
raise_oserror(err_code)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/PIL/ImageFile.py", line 71, in raise_oserror
raise OSError(message + " when reading image file")
OSError: broken data stream when reading image file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 90, in apply_transform
return _apply_transform(transform, data, unpack_items)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 54, in _apply_transform
return transform(parameters)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/compose.py", line 173, in __call__
input_ = apply_transform(_transform, input_, self.map_items, self.unpack_items, self.log_stats)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 114, in apply_transform
raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.io.dictionary.LoadImaged object at 0x7feaa2db66d0>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/vishwesh/Code/rxrx1_gitlab/rxrx1/rxrx1_testing.py", line 212, in <module>
main(args)
File "/home/vishwesh/Code/rxrx1_gitlab/rxrx1/rxrx1_testing.py", line 143, in main
for batch_data in dataset_loader:
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/dataset.py", line 97, in __getitem__
return self._transform(index)
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/data/dataset.py", line 83, in _transform
return apply_transform(self.transform, data_i) if self.transform is not None else data_i
File "/home/vishwesh/anaconda3/envs/py38_monai/lib/python3.8/site-packages/monai/transforms/transform.py", line 114, in apply_transform
raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x7feaa2dbc550>
Thanks for raising the issue.
I think we may add an option to the PILReader? Like load_truncated_images: bool = False?
And then support it with:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
@wyli @KumoLiu What do you think?
Thanks in advance.
looks like there's a side effect of changing the content in the original post https://discuss.pytorch.org/t/images-not-read-properly-anymore-after-an-epoch-of-successful-training/92586
the image then contains many zeros at the end (these zeros are not part of the original image).
could you please provide a minimal example to reproduce the issue? @finalelement
I can share the image offline, where it happens on my Ubuntu machine, but then again it's not necessary that it might happen at your end. I've shared the image via slack
To reproduce the issue use the below snippet:
import os
from monai.transforms import (
LoadImage
)
def main():
img_path = os.path.normpath('/put/image/path/here')
load_img = LoadImage()
img_d = load_img(img_path)
print('Debug here')
if __name__=="__main__":
main()
My observation is that the particular image in question could be read using the fix, from the given post. However, the image is not read in the right order, that will need more investigation I think.
Overall, it might still be worthwhile to provide this option to user, because a single data sample can break the flow of entire training/testing. In the particular case I am dealing with, this is one 2D image out of 2,64,000 images.
The outcome of this image does not matter that much, it could also be considered that a warning can be shown that this image was read incorrectly. Although it should be acknowledged, if it's a warning, it will mostly be neglected.
inactive close @finalelement please reopen if still not resolved.