generative-models icon indicating copy to clipboard operation
generative-models copied to clipboard

OSError: No such device (os error 19)

Open ezone1987 opened this issue 1 year ago • 2 comments

When testing radio_app_sv4d.py, the problem appears when loading the model.The model is already downloaded and device is set to default 'cpu', but it raises the OSError. image

ezone1987 avatar Sep 02 '24 01:09 ezone1987

Hi I also got the same problem. Here are the logs for your reference:

python scripts/sampling/simple_video_sample.py --input_path /net/.../hydrant.jpg --version sv3d_p --elevations_deg 10.0 VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False Initialized embedder # 1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False Initialized embedder # 2: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder # 3: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder # 4: ConcatTimestepEmbedderND with 0 params. Trainable: False Traceback (most recent call last): File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 349, in Fire(sample) File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 98, in sample model, filter = load_model( File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 340, in load_model model = instantiate_from_config(config.model).to(device).eval() File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/util.py", line 175, in instantiate_from_config return get_obj_from_str(config["target"])(**config.get("params", dict())) File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 81, in init self.init_from_ckpt(ckpt_path) File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 92, in init_from_ckpt sd = load_safetensors(path) File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file with safe_open(filename, framework="pt", device=device) as f: OSError: No such device (os error 19)

janiceylau avatar Sep 24 '24 09:09 janiceylau

I was getting the same error with safetensors - turns out I was using a network mounted SSD, and safetensors memorymap doesn't play nicely with that.

A simple fix is to read the file into memory, and then pass that to safetensors:

def load_model_from_network_storage(checkpoint_path):
    """
    Load a safetensors model from network storage by first copying to memory
    
    Args:
        checkpoint_path: Path to the safetensors file
    """
    print(f"Loading model from: {checkpoint_path}")
    
    # Read the entire file into memory
    print("Reading file into memory...")
    with open(checkpoint_path, 'rb') as f:
        file_content = f.read()
    print(f"Read {len(file_content) / (1024*1024*1024):.2f}GB into memory")
    
    # Load using safetensors.torch.load
    try:
        print("Loading tensors...")
        tensors = safetensors.torch.load(file_content, device="cpu")
        print(f"Successfully loaded {len(tensors)} tensors")
        return tensors
    except Exception as e:
        print(f"Error loading tensors: {str(e)}")
        raise

naveen-corpusant avatar Nov 23 '24 05:11 naveen-corpusant