MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

Very Slow Loading of NIfTI (.nii.gz) Files Compared to SimpleITK

Open ivezakis opened this issue 1 year ago • 9 comments

Describe the bug There is a significant performance gap between MONAI and SimpleITK when loading NIfTI (.nii.gz) files. In my own benchmark I have found:

  • MONAI average load time: 2.0321 seconds
  • SimpleITK average load time: 0.1135 seconds

Using cProfile I notice that for monai, the bulk of the loading time is consumed by numpy.asanyarray. Results are the following when running a loop with monai.transforms.LoadImage 10 times:

MONAI Profiling Results

ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 20.231 2.023 benchmark.py:11(load_with_monai)
10 0.016 0.002 20.212 2.021 monai/transforms/io/array.py:229(call)
10 0.000 0.000 19.959 1.996 monai/data/image_reader.py:927(get_data)
10 0.000 0.000 19.956 1.996 monai/data/image_reader.py:1018(_get_array_data)
60/40 17.903 0.298 19.956 0.499 {built-in method numpy.asanyarray}
10 0.000 0.000 2.053 0.205 nibabel/arrayproxy.py:436(array)
10 0.000 0.000 2.053 0.205 nibabel/arrayproxy.py:413(_get_scaled)
10 0.000 0.000 2.052 0.205 nibabel/arrayproxy.py:389(_get_unscaled)
10 0.041 0.004 2.050 0.205 nibabel/volumeutils.py:392(array_from_file)
10 0.052 0.005 2.008 0.201 {method 'readinto' of '_io._BufferedIOBase' objects}

SimpleITK Profiling Results

ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 1.139 0.114 benchmark.py:15(load_with_sitk)
10 0.000 0.000 1.076 0.108 SimpleITK/extra.py:340(ReadImage)
10 0.062 0.006 0.062 0.006 {built-in method numpy.array}
10 0.000 0.000 0.001 0.000 SimpleITK/extra.py:259(GetArrayViewFromImage)
10 0.000 0.000 0.001 0.000 SimpleITK/SimpleITK.py:8443(init)
10 0.001 0.000 0.001 0.000 {built-in method SimpleITK._SimpleITK.new_ImageFileReader}
10 0.000 0.000 0.000 0.000 SimpleITK/extra.py:156(_get_numpy_dtype)

Note: Times are in seconds. 'ncalls' is the number of calls, 'tottime' is the total time spent in the function (excluding time in subfunctions), 'percall' is the average time spent per call, and 'cumtime' is the cumulative time spent in the function and all subfunctions.

As shown in the profiling results, the majority of time in MONAI is spent in numpy operations, particularly numpy.asanyarray, which takes about 19.956 seconds cumulatively. In contrast, SimpleITK's ReadImage function takes only 1.076 seconds.

To Reproduce

import time
import os
import numpy as np
import monai
import SimpleITK as sitk

def load_with_monai(file_path):
    start_time = time.time()
    image = monai.transforms.LoadImage()(file_path)
    end_time = time.time()
    return image, end_time - start_time

def load_with_sitk(file_path):
    start_time = time.time()
    image = sitk.ReadImage(file_path)
    array = sitk.GetArrayFromImage(image)
    end_time = time.time()
    return array, end_time - start_time

def run_benchmark(file_path, num_iterations=10):
    monai_times = []
    sitk_times = []

    for _ in range(num_iterations):
        _, monai_time = load_with_monai(file_path)
        monai_times.append(monai_time)

        _, sitk_time = load_with_sitk(file_path)
        sitk_times.append(sitk_time)

    print(f"MONAI average load time: {np.mean(monai_times):.4f} seconds")
    print(f"SimpleITK average load time: {np.mean(sitk_times):.4f} seconds")

if __name__ == "__main__":
    file_path = "path/to/file.nii.gz"
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
    else:
        run_benchmark(file_path)

Environment

MONAI version: 1.3.2
Numpy version: 1.26.4
Pytorch version: 2.4.1
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 59a7211070538586369afd4a01eca0a7fe2e742e
MONAI __file__: C:\Users\<username>\miniconda3\envs\panorama\Lib\site-packages\monai\__init__.py

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
ITK version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 5.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
scipy version: 1.13.1
Pillow version: 10.4.0
Tensorboard version: NOT INSTALLED or UNKNOWN VERSION.
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.19.1
tqdm version: 4.66.5
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: NOT INSTALLED or UNKNOWN VERSION.
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
clearml version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
`psutil` required for `print_system_info`

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 12.4
cuDNN enabled: True
NVIDIA_TF32_OVERRIDE: None
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE: None
cuDNN version: 90100
Current device: 0
Library compiled for CUDA architectures: ['sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
GPU 0 Name: NVIDIA GeForce RTX 4070
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 46
GPU 0 Total memory (GB): 12.0
GPU 0 CUDA capability (maj.min): 8.9

Additional context Add any other context about the problem here.

ivezakis avatar Sep 24 '24 18:09 ivezakis

import time import os import numpy as np import monai import SimpleITK as sitk import cProfile

Custom LoadImage with prefetching

class FastMONAILoadImage(monai.transforms.LoadImage): def call(self, filename): data = super().call(filename) # Directly extract the image array to bypass unnecessary numpy operations return np.asarray(data[0]) # Only return the image array

def load_with_monai(file_path): start_time = time.time() image = FastMONAILoadImage()(file_path) # Use the optimized loader end_time = time.time() return image, end_time - start_time

def load_with_sitk(file_path): start_time = time.time() image = sitk.ReadImage(file_path) array = sitk.GetArrayFromImage(image) end_time = time.time() return array, end_time - start_time

def run_benchmark(file_path, num_iterations=10): monai_times = [] sitk_times = []

# Profile the MONAI loading process
cProfile.runctx('load_with_monai(file_path)', globals(), locals(), filename="monai_profile.prof")

for _ in range(num_iterations):
    _, monai_time = load_with_monai(file_path)
    monai_times.append(monai_time)

    _, sitk_time = load_with_sitk(file_path)
    sitk_times.append(sitk_time)

print(f"MONAI average load time: {np.mean(monai_times):.4f} seconds")
print(f"SimpleITK average load time: {np.mean(sitk_times):.4f} seconds")

if name == "main": file_path = "path/to/file.nii.gz" if not os.path.exists(file_path): print(f"File not found: {file_path}") else: run_benchmark(file_path)

kartikshastrakar avatar Sep 26 '24 12:09 kartikshastrakar

Could you please explain why does 'ncalls' for numpy.asanyarray in MONAI profiling results is shown as 60/40, what does this indicate?

function2-llx avatar Dec 30 '24 14:12 function2-llx

The ncalls value shown as 60/40 in MONAI profiling results means the function numpy.asanyarray was called 60 times in total. Out of these, 40 were direct calls (where it was explicitly invoked), and the remaining 20 were indirect calls (triggered by other functions). On Mon, 30 Dec 2024 at 7:41 PM, 罗崚骁 (LUO Lingxiao) @.***> wrote:

Could you please explain why does 'ncalls' for numpy.asanyarray in MONAI profiling results is shown as 60/40, what does this indicate?

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2565534449, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTPTAWOHDYKO2ERLVET2IFIB3AVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVGUZTINBUHE . You are receiving this because you commented.Message ID: @.***>

kartikshastrakar avatar Dec 30 '24 15:12 kartikshastrakar

You can closed the Issue by your end

kartikshastrakar avatar Feb 26 '25 04:02 kartikshastrakar

Hi @ericspod and @virginiafdez, Could you please take a look at this issue? It's the one I mentioned in the meeting but couldn't find at the time. Thanks.

KumoLiu avatar Feb 26 '25 05:02 KumoLiu

Yes, I will check.

On Wed, 26 Feb 2025 at 10:37 AM, YunLiu @.***> wrote:

Hi @ericspod https://github.com/ericspod and @virginiafdez https://github.com/virginiafdez, Could you please take a look at this issue? It's the one I mentioned in the meeting but couldn't find at the time. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTIE5442GV4JWGLKSTT2RVDYFAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTHEYDMMZWGM . You are receiving this because you commented.Message ID: @.***> [image: KumoLiu]KumoLiu left a comment (Project-MONAI/MONAI#8107) https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363

Hi @ericspod https://github.com/ericspod and @virginiafdez https://github.com/virginiafdez, Could you please take a look at this issue? It's the one I mentioned in the meeting but couldn't find at the time. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTIE5442GV4JWGLKSTT2RVDYFAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTHEYDMMZWGM . You are receiving this because you commented.Message ID: @.***>

kartikshastrakar avatar Feb 26 '25 05:02 kartikshastrakar

The issue was addressed while ago for the similar work can you please the close the call

On Wed, 26 Feb 2025 at 11:04 AM, Kartik Shastrakar < @.***> wrote:

Yes, I will check.

On Wed, 26 Feb 2025 at 10:37 AM, YunLiu @.***> wrote:

Hi @ericspod https://github.com/ericspod and @virginiafdez https://github.com/virginiafdez, Could you please take a look at this issue? It's the one I mentioned in the meeting but couldn't find at the time. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTIE5442GV4JWGLKSTT2RVDYFAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTHEYDMMZWGM . You are receiving this because you commented.Message ID: @.***> [image: KumoLiu]KumoLiu left a comment (Project-MONAI/MONAI#8107) https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363

Hi @ericspod https://github.com/ericspod and @virginiafdez https://github.com/virginiafdez, Could you please take a look at this issue? It's the one I mentioned in the meeting but couldn't find at the time. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2683906363, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTIE5442GV4JWGLKSTT2RVDYFAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTHEYDMMZWGM . You are receiving this because you commented.Message ID: @.***>

kartikshastrakar avatar Feb 26 '25 08:02 kartikshastrakar

I think this is still a relevant issue. I'm not seeing such dramatic differences with current versions of libraries, but MONAI still takes twice as long to read a file. There is a difference that LoadImage produces a tensor with the dimensions unchanged from the file, while SITK produces numpy with dimensions changed by default. This doesn't explain the difference seen however so we should see about updating how we do loading to match what the newer Cupy based code does just with Numpy.

ericspod avatar Feb 26 '25 14:02 ericspod

I haven’t tried the latest version as i am focused on the other project so will try to look at it when i get free time

On Wed, 26 Feb 2025 at 8:11 PM, Eric Kerfoot @.***> wrote:

I think this is still a relevant issue. I'm not seeing such dramatic differences with current versions of libraries, but MONAI still takes twice as long to read a file. There is a difference that LoadImage produces a tensor with the dimensions unchanged from the file, while SITK produces numpy with dimensions changed by default. This doesn't explain the difference seen however so we should see about updating how we do loading to match what the newer Cupy based code does just with Numpy.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2685248924, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTOG3NLCK5B3FC55LXT2RXHBNAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVGI2DQOJSGQ . You are receiving this because you commented.Message ID: @.***> [image: ericspod]ericspod left a comment (Project-MONAI/MONAI#8107) https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2685248924

I think this is still a relevant issue. I'm not seeing such dramatic differences with current versions of libraries, but MONAI still takes twice as long to read a file. There is a difference that LoadImage produces a tensor with the dimensions unchanged from the file, while SITK produces numpy with dimensions changed by default. This doesn't explain the difference seen however so we should see about updating how we do loading to match what the newer Cupy based code does just with Numpy.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAI/issues/8107#issuecomment-2685248924, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOZRTOG3NLCK5B3FC55LXT2RXHBNAVCNFSM6AAAAABOY6FZ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVGI2DQOJSGQ . You are receiving this because you commented.Message ID: @.***>

kartikshastrakar avatar Feb 26 '25 15:02 kartikshastrakar