morphocluster icon indicating copy to clipboard operation
morphocluster copied to clipboard

Problem reading ZIP archive

Open vdausmann opened this issue 3 years ago • 6 comments

Hi, features doesn't work for me with the /examples/objects.zip file that comes with the repo.

(morphocluster) root@f6908fd5e809:/data/example# morphocluster features model_state.pth objects.zip
/opt/conda/envs/morphocluster/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
Using pretrained model.
Reading archive... Traceback (most recent call last):
  File "/opt/conda/envs/morphocluster/bin/morphocluster", line 33, in <module>
    sys.exit(load_entry_point('morphocluster', 'console_scripts', 'morphocluster')())
  File "/opt/conda/envs/morphocluster/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morphocluster/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morphocluster/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morphocluster/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/morphocluster/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/code/morphocluster/scripts.py", line 148, in features
    extract_features(archive_fn, output_fn, parameters_fn, normalize, batch_size, input_mean, input_std)
  File "/code/morphocluster/processing/extract_features.py", line 436, in extract_features
    dataset = ArchiveDataset(archive_fn, transform)
  File "/code/morphocluster/processing/extract_features.py", line 266, in __init__
    self.archive = zipfile.ZipFile(archive_fn)
  File "/opt/conda/envs/morphocluster/lib/python3.7/zipfile.py", line 1258, in __init__
    self._RealGetContents()
  File "/opt/conda/envs/morphocluster/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

had to... *update scikit-learn and torchvision==0.12.0, *in api.py change from sklearn.manifold.isomap import Isomap to from sklearn.manifold import Isomap *replace('torch.ao.quantization', 'torch.quantization') in torchvision/models/quantization/mobilenetv2.py & mobilenetv3.py ... to get to this point.

Host is MacOs arm64

Cheers Veit

vdausmann avatar May 25 '22 07:05 vdausmann

Hi Veit, Thanks for reporting!

Which version are you running? (What is the output of git describe --tags?) Can you open the zip with another program?

moi90 avatar May 25 '22 16:05 moi90

Version is 0.2.0-26-g6fc6386

I have successfully opened the file running the ZipFile.zipfile function from within the Docker container.

Anyway, it's strange that I had to clear so many dependencies in the Docker container. It was my understanding that's the reason why we use Docker ;). However, it's the first time I'm using Docker, so maybe I made some mistakes.

I guess the error lies somewhere in the click package. I have tried a different version (8.1.3), but no success.

vdausmann avatar May 27 '22 06:05 vdausmann

zipfile.ZipFile is used by MorphoCluster so it seems strange that it works in one place and not in the other...

Anyway, it's strange that I had to clear so many dependencies in the Docker container. It was my understanding that's the reason why we use Docker ;).

Yes, that is one application case of Docker. Currently, we merely use Docker to provide the services we need in a controlled way. But you are right, I should publish a Docker Image so that it does not have to be built by each user individually.

I have this issue on my list and will investigate as soon as possible.

moi90 avatar May 30 '22 10:05 moi90

Hi Veit,

In the meantime, I got around to creating a repository on hub.docker.com and built a docker image with the default settings:

https://hub.docker.com/repository/docker/morphocluster/morphocluster

If you replace the build section in you docker compose file with the following, you can skip the build step and you should immediately receive a working container:

services:
  morphocluster:
-   build:
-      context: .
-      dockerfile: docker/morphocluster/Dockerfile
+  image: morphocluster/morphocluster:latestt

In the future, I might provide pre-built images regularily so that users can skip the whole setup step altogether.

This won't help with your problem, though. I get the same zipfile.BadZipFile: File is not a zip file error. unzip -t objects.zip, however, reports no problems and the error persists even after zip -F.

In a separate interpreter session in the Docker container, I can open the file and flask load-objects example/objects.zip works as well.

Maybe some loaded modules in morphocluster features somehow interfere...

moi90 avatar Jun 07 '22 11:06 moi90

This could also be related to multiprocessing / multithreading issues: https://github.com/python/cpython/issues/83544

moi90 avatar Jun 07 '22 12:06 moi90

In the meantime, you can replace morphocluster features objects.zip features.h5 with python -c "from morphocluster.scripts import main; main()" features objects.zip features.h5. This works in my setup.

I'd be thankful for an explanation as both commands should be virtually identical.

moi90 avatar Jun 07 '22 12:06 moi90