model_optimization GPTQ error: TypeError: descriptor 'to' for 'torch._C.TensorBase' objects doesn't apply to a 'torch.device' object

Issue Type

Bug

Source

pip (mct-nightly)

MCT Version

PR #1186

OS Platform and Distribution

Linux Ubuntu 22.04

Python version

3.10

Describe the issue

I'm attempting to quantize a YOLOv8n model from the Ultralytics package using MCT GPTQ. However, I encounter this error during the calibration process:

-> 1106 gptq_quant_model, _ = mct.gptq.pytorch_gradient_post_training_quantization(
   1107     model=self.model,
   1108     representative_data_gen=representative_dataset_gen,
   1109     target_resource_utilization=resource_utilization,
   1110     gptq_config=gptq_config,
   1111     core_config=config,
   1112     target_platform_capabilities=tpc)
   1114 print('Quantized-GPTQ model is ready')
   1116 return f, None

File ~/repos/model_optimization/model_compression_toolkit/gptq/pytorch/quantization_facade.py:196, in pytorch_gradient_post_training_quantization(model, representative_data_gen, target_resource_utilization, core_config, gptq_config, gptq_representative_data_gen, target_platform_capabilities)
    191 float_graph = copy.deepcopy(graph)
    193 # ---------------------- #
    194 # GPTQ Runner
    195 # ---------------------- #
--> 196 graph_gptq = gptq_runner(graph,
    197                          core_config,
    198                          gptq_config,
    199                          representative_data_gen,
    200                          gptq_representative_data_gen if gptq_representative_data_gen else representative_data_gen,
    201                          DEFAULT_PYTORCH_INFO,
    202                          fw_impl,
    203                          tb_w,
    204                          hessian_info_service=hessian_info_service)
    206 if core_config.debug_config.analyze_similarity:
    207     analyzer_model_quantization(representative_data_gen,
    208                                 tb_w,
    209                                 float_graph,
    210                                 graph_gptq,
    211                                 fw_impl,
    212                                 DEFAULT_PYTORCH_INFO)

File ~/repos/model_optimization/model_compression_toolkit/gptq/runner.py:115, in gptq_runner(tg, core_config, gptq_config, representative_data_gen, gptq_representative_data_gen, fw_info, fw_impl, tb_w, hessian_info_service)
    111 #############################################
    112 # Gradient Based Post Training Quantization
    113 #############################################
    114 Logger.info("Running GPTQ optimization.")
--> 115 tg_gptq = _apply_gptq(gptq_config,
    116                       gptq_representative_data_gen,
    117                       tb_w,
    118                       tg,
    119                       tg_bias,
    120                       fw_info,
    121                       fw_impl,
    122                       hessian_info_service=hessian_info_service)
    124 return tg_gptq

File ~/repos/model_optimization/model_compression_toolkit/gptq/runner.py:62, in _apply_gptq(gptq_config, representative_data_gen, tb_w, tg, tg_bias, fw_info, fw_impl, hessian_info_service)
     43 """
     44 Apply GPTQ to improve accuracy of quantized model.
     45 Build two models from a graph: A teacher network (float model) and a student network (quantized model).
   (...)
     59
     60 """
     61 if gptq_config is not None and gptq_config.n_epochs > 0:
---> 62     tg_bias = gptq_training(tg,
     63                             tg_bias,
     64                             gptq_config,
     65                             representative_data_gen,
     66                             fw_impl,
     67                             fw_info,
     68                             hessian_info_service=hessian_info_service)
     70     if tb_w is not None:
     71         tb_w.add_graph(tg_bias, 'after_gptq')

File ~/repos/model_optimization/model_compression_toolkit/gptq/common/gptq_training.py:287, in gptq_training(graph_float, graph_quant, gptq_config, representative_data_gen, fw_impl, fw_info, hessian_info_service)
    278 gptq_trainer = gptq_trainer_obj(graph_float,
    279                                 graph_quant,
    280                                 gptq_config,
   (...)
    283                                 representative_data_gen,
    284                                 hessian_info_service=hessian_info_service)
    286 # Training process
--> 287 gptq_trainer.train(representative_data_gen)
    289 # Update graph
    290 graph_quant = gptq_trainer.update_graph()

File ~/repos/model_optimization/model_compression_toolkit/gptq/pytorch/gptq_training.py:193, in PytorchGPTQTrainer.train(self, representative_data_gen)
    190     optimizer.add_param_group({'params': params})
    192 # Set models mode
--> 193 set_model(self.float_model, False)
    194 set_model(self.fxp_model, True)
    195 self._set_requires_grad()

File ~/repos/model_optimization/model_compression_toolkit/core/pytorch/utils.py:41, in set_model(model, train_mode)
     38     model.eval()
     40 device = get_working_device()
---> 41 model.to(device)

TypeError: descriptor 'to' for 'torch._C.TensorBase' objects doesn't apply to a 'torch.device' object

cc: @Idan-BenAmi

Expected behaviour

No response

Code to reproduce the issue

Dependencies:

Ultralytics package:

pip install git+https://github.com/ultralytics/ultralytics.git@quan

MCT:

pip install git+https://github.com/ambitious-octopus/model_optimization.git@get-output-fix

Code:

import os
import model_compression_toolkit as mct
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import coco_dataset_generator
from tutorials.mct_model_garden.models_pytorch.yolov8.yolov8_preprocess import yolov8_preprocess_chw_transpose
from typing import Iterator, Tuple, List
import wget 
import zipfile
import logging


DATASET_ROOT = "./coco"

if not os.path.isdir(DATASET_ROOT):
    logging.info('Downloading COCO dataset')
    os.mkdir(DATASET_ROOT)
    wget.download('http://images.cocodataset.org/annotations/annotations_trainval2017.zip')
    with zipfile.ZipFile("annotations_trainval2017.zip", 'r') as zip_ref:
        zip_ref.extractall(DATASET_ROOT)
    os.remove('annotations_trainval2017.zip')
    
    wget.download('http://images.cocodataset.org/zips/val2017.zip')
    with zipfile.ZipFile("val2017.zip", 'r') as zip_ref:
        zip_ref.extractall(DATASET_ROOT)
    os.remove('val2017.zip')
    

from ultralytics import YOLO
from ultralytics.nn.modules import C2f, Detect
model = YOLO("yolov8n.pt").model

for m in model.modules():
    if isinstance(m, C2f):
        m.forward = m.forward_fx
    if isinstance(m, Detect):
        m.export = True
        m.format = "mct"

REPRESENTATIVE_DATASET_FOLDER = f'{DATASET_ROOT}/val2017/'
REPRESENTATIVE_DATASET_ANNOTATION_FILE = f'{DATASET_ROOT}/annotations/instances_val2017.json'
BATCH_SIZE = 4
n_iters = 20

# Load representative dataset
logging.info('Loading representative dataset')
representative_dataset = coco_dataset_generator(dataset_folder=REPRESENTATIVE_DATASET_FOLDER,
                                                annotation_file=REPRESENTATIVE_DATASET_ANNOTATION_FILE,
                                                preprocess=yolov8_preprocess_chw_transpose,
                                                batch_size=BATCH_SIZE)

# Define representative dataset generator
def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, H, W ,C].
    Args:
        n_iter: number of iterations for MCT to calibrate on
    Returns:
        A representative dataset generator
    """       
    def representative_dataset() -> Iterator[List]:
        ds_iter = iter(dataset_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)[0]]

    return representative_dataset

logging.info('Creating representative dataset generator')
# Get representative dataset generator
representative_dataset_gen = get_representative_dataset(n_iter=n_iters,
                                                        dataset_loader=representative_dataset)

# Set IMX500-v1 TPC
logging.info('Setting target platform capabilities')
tpc = mct.get_target_platform_capabilities(fw_name="pytorch",
                                           target_platform_name='imx500',
                                           target_platform_version='v1')

# # Specify the necessary configuration for mixed precision quantization. To keep the tutorial brief, we'll use a small set of images and omit the hessian metric for mixed precision calculations. It's important to be aware that this choice may impact the resulting accuracy. 
mp_config = mct.core.MixedPrecisionQuantizationConfig(num_of_images=5,
                                                      use_hessian_based_scores=False)
config = mct.core.CoreConfig(mixed_precision_config=mp_config,
                             quantization_config=mct.core.QuantizationConfig(shift_negative_activation_correction=True))


# # Define target Resource Utilization for mixed precision weights quantization (75% of 'standard' 8bits quantization)
resource_utilization_data = mct.core.pytorch_resource_utilization_data(in_model=model,
                                                                       representative_data_gen=
                                                                       representative_dataset_gen,
                                                                       core_config=config,
                                                                       target_platform_capabilities=tpc)


resource_utilization = mct.core.ResourceUtilization(weights_memory=resource_utilization_data.weights_memory * 0.75)

# Specify the necessary configuration for Gradient-Based PTQ.
n_gptq_epochs = 1000
gptq_config = mct.gptq.get_pytorch_gptq_config(n_epochs=n_gptq_epochs, use_hessian_based_weights=False)

# Perform Gradient-Based Post Training Quantization
gptq_quant_model, _ = mct.gptq.pytorch_gradient_post_training_quantization(
    model=model,
    representative_data_gen=representative_dataset_gen,
    target_resource_utilization=resource_utilization,
    gptq_config=gptq_config,
    core_config=config,
    target_platform_capabilities=tpc)

print('Quantized-GPTQ model is ready')

Log output

No response

Aug 23 '24 13:08 ambitious-octopus

Hi @ambitious-octopus, once I'm able to reproduce the issue in #1186, I'll start to debug this issue.

Aug 26 '24 12:08 Idan-BenAmi

Hi @ambitious-octopus , We have found the root cause for this error. We noticed that your model performs operations on constants, such as ”to” and “mul” operations, which cause failures in MCT. (specifically cause the model.to(device) error). To be more specific, I think those operations are done in the anchor preparation in your model.

This issue runs deeper, as manipulating constants during model inference can lead to accuracy degradation. Performing these manipulations in advance and using final constant values instead would enhance accuracy and reduce unnecessary calculations. Therefore, we recommend removing constant manipulations from the model and using the finalized constant values instead. This approach should also resolve issue 1189.

Idan

Sep 09 '24 12:09 Idan-BenAmi

While avoiding operators like "to" seems to be correct for this model, we still need to address how to manage such issues. During torch FX, node names should be checked to ensure they aren't reserved names. A suggestion to handle such cases can be found in: #1204

Oct 07 '24 07:10 Idan-BenAmi

Stale issue message

Dec 06 '24 10:12 github-actions[bot]

Hello, dose anybody solve this problem? or has some temporary solution to avoid this error?

Jan 08 '25 12:01 CYL0089

Hello, dose anybody solve this problem? or has some temporary solution to avoid this error?

I found that change the set_model() in model_compression_toolkit/core/pytorch/utils.py as follow can avoid this error: try: model.to(device) except: model = model.cuda()

Jan 09 '25 11:01 CYL0089

Hi @CYL0089 , Thank you for your feedback. Your suggestion is a valid solution to avoid this error. Another solution was suggested here: #1204 Thanks Idan

Jan 29 '25 15:01 Idan-BenAmi

Stale issue message

Mar 31 '25 10:03 github-actions[bot]

Our current recommendation is to avoid using the to() operator within the model's forward method before applying MCT. One problematic scenario occurs when to() is applied to constant values defined directly inside the forward method, rather than being registered as buffers using register_buffer. In such cases, MCT may treat these values as dynamic tensors, which can lead to suboptimal behavior.

May 08 '25 07:05 Idan-BenAmi