GPTQ error: TypeError: descriptor 'to' for 'torch._C.TensorBase' objects doesn't apply to a 'torch.device' object
Issue Type
Bug
Source
pip (mct-nightly)
MCT Version
PR #1186
OS Platform and Distribution
Linux Ubuntu 22.04
Python version
3.10
Describe the issue
I'm attempting to quantize a YOLOv8n model from the Ultralytics package using MCT GPTQ. However, I encounter this error during the calibration process:
-> 1106 gptq_quant_model, _ = mct.gptq.pytorch_gradient_post_training_quantization(
1107 model=self.model,
1108 representative_data_gen=representative_dataset_gen,
1109 target_resource_utilization=resource_utilization,
1110 gptq_config=gptq_config,
1111 core_config=config,
1112 target_platform_capabilities=tpc)
1114 print('Quantized-GPTQ model is ready')
1116 return f, None
File ~/repos/model_optimization/model_compression_toolkit/gptq/pytorch/quantization_facade.py:196, in pytorch_gradient_post_training_quantization(model, representative_data_gen, target_resource_utilization, core_config, gptq_config, gptq_representative_data_gen, target_platform_capabilities)
191 float_graph = copy.deepcopy(graph)
193 # ---------------------- #
194 # GPTQ Runner
195 # ---------------------- #
--> 196 graph_gptq = gptq_runner(graph,
197 core_config,
198 gptq_config,
199 representative_data_gen,
200 gptq_representative_data_gen if gptq_representative_data_gen else representative_data_gen,
201 DEFAULT_PYTORCH_INFO,
202 fw_impl,
203 tb_w,
204 hessian_info_service=hessian_info_service)
206 if core_config.debug_config.analyze_similarity:
207 analyzer_model_quantization(representative_data_gen,
208 tb_w,
209 float_graph,
210 graph_gptq,
211 fw_impl,
212 DEFAULT_PYTORCH_INFO)
File ~/repos/model_optimization/model_compression_toolkit/gptq/runner.py:115, in gptq_runner(tg, core_config, gptq_config, representative_data_gen, gptq_representative_data_gen, fw_info, fw_impl, tb_w, hessian_info_service)
111 #############################################
112 # Gradient Based Post Training Quantization
113 #############################################
114 Logger.info("Running GPTQ optimization.")
--> 115 tg_gptq = _apply_gptq(gptq_config,
116 gptq_representative_data_gen,
117 tb_w,
118 tg,
119 tg_bias,
120 fw_info,
121 fw_impl,
122 hessian_info_service=hessian_info_service)
124 return tg_gptq
File ~/repos/model_optimization/model_compression_toolkit/gptq/runner.py:62, in _apply_gptq(gptq_config, representative_data_gen, tb_w, tg, tg_bias, fw_info, fw_impl, hessian_info_service)
43 """
44 Apply GPTQ to improve accuracy of quantized model.
45 Build two models from a graph: A teacher network (float model) and a student network (quantized model).
(...)
59
60 """
61 if gptq_config is not None and gptq_config.n_epochs > 0:
---> 62 tg_bias = gptq_training(tg,
63 tg_bias,
64 gptq_config,
65 representative_data_gen,
66 fw_impl,
67 fw_info,
68 hessian_info_service=hessian_info_service)
70 if tb_w is not None:
71 tb_w.add_graph(tg_bias, 'after_gptq')
File ~/repos/model_optimization/model_compression_toolkit/gptq/common/gptq_training.py:287, in gptq_training(graph_float, graph_quant, gptq_config, representative_data_gen, fw_impl, fw_info, hessian_info_service)
278 gptq_trainer = gptq_trainer_obj(graph_float,
279 graph_quant,
280 gptq_config,
(...)
283 representative_data_gen,
284 hessian_info_service=hessian_info_service)
286 # Training process
--> 287 gptq_trainer.train(representative_data_gen)
289 # Update graph
290 graph_quant = gptq_trainer.update_graph()
File ~/repos/model_optimization/model_compression_toolkit/gptq/pytorch/gptq_training.py:193, in PytorchGPTQTrainer.train(self, representative_data_gen)
190 optimizer.add_param_group({'params': params})
192 # Set models mode
--> 193 set_model(self.float_model, False)
194 set_model(self.fxp_model, True)
195 self._set_requires_grad()
File ~/repos/model_optimization/model_compression_toolkit/core/pytorch/utils.py:41, in set_model(model, train_mode)
38 model.eval()
40 device = get_working_device()
---> 41 model.to(device)
TypeError: descriptor 'to' for 'torch._C.TensorBase' objects doesn't apply to a 'torch.device' object
cc: @Idan-BenAmi
Expected behaviour
No response
Code to reproduce the issue
Dependencies:
- Ultralytics package:
pip install git+https://github.com/ultralytics/ultralytics.git@quan
- MCT:
pip install git+https://github.com/ambitious-octopus/model_optimization.git@get-output-fix
Code:
import os
import model_compression_toolkit as mct
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import coco_dataset_generator
from tutorials.mct_model_garden.models_pytorch.yolov8.yolov8_preprocess import yolov8_preprocess_chw_transpose
from typing import Iterator, Tuple, List
import wget
import zipfile
import logging
DATASET_ROOT = "./coco"
if not os.path.isdir(DATASET_ROOT):
logging.info('Downloading COCO dataset')
os.mkdir(DATASET_ROOT)
wget.download('http://images.cocodataset.org/annotations/annotations_trainval2017.zip')
with zipfile.ZipFile("annotations_trainval2017.zip", 'r') as zip_ref:
zip_ref.extractall(DATASET_ROOT)
os.remove('annotations_trainval2017.zip')
wget.download('http://images.cocodataset.org/zips/val2017.zip')
with zipfile.ZipFile("val2017.zip", 'r') as zip_ref:
zip_ref.extractall(DATASET_ROOT)
os.remove('val2017.zip')
from ultralytics import YOLO
from ultralytics.nn.modules import C2f, Detect
model = YOLO("yolov8n.pt").model
for m in model.modules():
if isinstance(m, C2f):
m.forward = m.forward_fx
if isinstance(m, Detect):
m.export = True
m.format = "mct"
REPRESENTATIVE_DATASET_FOLDER = f'{DATASET_ROOT}/val2017/'
REPRESENTATIVE_DATASET_ANNOTATION_FILE = f'{DATASET_ROOT}/annotations/instances_val2017.json'
BATCH_SIZE = 4
n_iters = 20
# Load representative dataset
logging.info('Loading representative dataset')
representative_dataset = coco_dataset_generator(dataset_folder=REPRESENTATIVE_DATASET_FOLDER,
annotation_file=REPRESENTATIVE_DATASET_ANNOTATION_FILE,
preprocess=yolov8_preprocess_chw_transpose,
batch_size=BATCH_SIZE)
# Define representative dataset generator
def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):
"""
This function creates a representative dataset generator. The generator yields numpy
arrays of batches of shape: [Batch, H, W ,C].
Args:
n_iter: number of iterations for MCT to calibrate on
Returns:
A representative dataset generator
"""
def representative_dataset() -> Iterator[List]:
ds_iter = iter(dataset_loader)
for _ in range(n_iter):
yield [next(ds_iter)[0]]
return representative_dataset
logging.info('Creating representative dataset generator')
# Get representative dataset generator
representative_dataset_gen = get_representative_dataset(n_iter=n_iters,
dataset_loader=representative_dataset)
# Set IMX500-v1 TPC
logging.info('Setting target platform capabilities')
tpc = mct.get_target_platform_capabilities(fw_name="pytorch",
target_platform_name='imx500',
target_platform_version='v1')
# # Specify the necessary configuration for mixed precision quantization. To keep the tutorial brief, we'll use a small set of images and omit the hessian metric for mixed precision calculations. It's important to be aware that this choice may impact the resulting accuracy.
mp_config = mct.core.MixedPrecisionQuantizationConfig(num_of_images=5,
use_hessian_based_scores=False)
config = mct.core.CoreConfig(mixed_precision_config=mp_config,
quantization_config=mct.core.QuantizationConfig(shift_negative_activation_correction=True))
# # Define target Resource Utilization for mixed precision weights quantization (75% of 'standard' 8bits quantization)
resource_utilization_data = mct.core.pytorch_resource_utilization_data(in_model=model,
representative_data_gen=
representative_dataset_gen,
core_config=config,
target_platform_capabilities=tpc)
resource_utilization = mct.core.ResourceUtilization(weights_memory=resource_utilization_data.weights_memory * 0.75)
# Specify the necessary configuration for Gradient-Based PTQ.
n_gptq_epochs = 1000
gptq_config = mct.gptq.get_pytorch_gptq_config(n_epochs=n_gptq_epochs, use_hessian_based_weights=False)
# Perform Gradient-Based Post Training Quantization
gptq_quant_model, _ = mct.gptq.pytorch_gradient_post_training_quantization(
model=model,
representative_data_gen=representative_dataset_gen,
target_resource_utilization=resource_utilization,
gptq_config=gptq_config,
core_config=config,
target_platform_capabilities=tpc)
print('Quantized-GPTQ model is ready')
Log output
No response
Hi @ambitious-octopus, once I'm able to reproduce the issue in #1186, I'll start to debug this issue.
Hi @ambitious-octopus , We have found the root cause for this error. We noticed that your model performs operations on constants, such as ”to” and “mul” operations, which cause failures in MCT. (specifically cause the model.to(device) error). To be more specific, I think those operations are done in the anchor preparation in your model.
This issue runs deeper, as manipulating constants during model inference can lead to accuracy degradation. Performing these manipulations in advance and using final constant values instead would enhance accuracy and reduce unnecessary calculations. Therefore, we recommend removing constant manipulations from the model and using the finalized constant values instead. This approach should also resolve issue 1189.
Idan
While avoiding operators like "to" seems to be correct for this model, we still need to address how to manage such issues. During torch FX, node names should be checked to ensure they aren't reserved names. A suggestion to handle such cases can be found in: #1204
Stale issue message
Hello, dose anybody solve this problem? or has some temporary solution to avoid this error?
Hello, dose anybody solve this problem? or has some temporary solution to avoid this error?
I found that change the set_model() in model_compression_toolkit/core/pytorch/utils.py as follow can avoid this error:
try: model.to(device) except: model = model.cuda()
Hi @CYL0089 , Thank you for your feedback. Your suggestion is a valid solution to avoid this error. Another solution was suggested here: #1204 Thanks Idan
Stale issue message
Our current recommendation is to avoid using the to() operator within the model's forward method before applying MCT. One problematic scenario occurs when to() is applied to constant values defined directly inside the forward method, rather than being registered as buffers using register_buffer. In such cases, MCT may treat these values as dynamic tensors, which can lead to suboptimal behavior.