Training loss is zero, across all batches.
I am trying to configurate the ,yaml file to to fine-tuning the model on an available Roboflow dataset (Dental AI). But the training loss is zero. The configured yaml file is as follows:
# @package _global_
defaults:
- _self_
# ============================================================================
# Paths Configuration (Chage this to your own paths)
# ============================================================================
paths:
roboflow_vl_100_root: /kaggle/working
experiment_log_dir: /kaggle/working/log
bpe_path: /kaggle/working/sam3/assets/bpe_simple_vocab_16e6.txt.gz # This should be under assets/bpe_simple_vocab_16e6.txt.gz
# Roboflow dataset configuration
roboflow_train:
num_images: 20 # Note: This is the number of images used for training. If null, all images are used.
supercategory: Dental-AI-3 #${all_roboflow_supercategories.${string:${submitit.job_array.task_index}}}
# Training transforms pipeline
train_transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterCrowds
- _target_: sam3.train.transforms.point_sampling.RandomizeInputBbox
box_noise_std: 0.1
box_noise_max: 20
- _target_: sam3.train.transforms.segmentation.DecodeRle
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes:
_target_: sam3.train.transforms.basic.get_random_resize_scales
size: ${scratch.resolution}
min_size: 480
rounded: false
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: ${scratch.resolution}
square: true
consistent_transform: ${scratch.consistent_transform}
- _target_: sam3.train.transforms.basic_for_api.PadToSizeAPI
size: ${scratch.resolution}
consistent_transform: ${scratch.consistent_transform}
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean: ${scratch.train_norm_mean}
std: ${scratch.train_norm_std}
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterFindQueriesWithTooManyOut
max_num_objects: ${scratch.max_ann_per_img}
# Validation transforms pipeline
val_transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes: ${scratch.resolution}
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: ${scratch.resolution}
square: true
consistent_transform: False
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean: ${scratch.train_norm_mean}
std: ${scratch.train_norm_std}
# loss config (no mask loss)
# loss:
# _target_: sam3.train.loss.sam3_loss.Sam3LossWrapper
# matcher: ${scratch.matcher}
# o2m_weight: 2.0
# o2m_matcher:
# _target_: sam3.train.matcher.BinaryOneToManyMatcher
# alpha: 0.3
# threshold: 0.4
# topk: 4
# use_o2m_matcher_on_o2m_aux: false # Another option is true
# loss_fns_find:
# - _target_: sam3.train.loss.loss_fns.Boxes
# weight_dict:
# loss_bbox: 5.0
# loss_giou: 2.0
# - _target_: sam3.train.loss.loss_fns.IABCEMdetr
# weak_loss: False
# weight_dict:
# loss_ce: 20.0 # Another option is 100.0
# presence_loss: 20.0
# pos_weight: 10.0 # Another option is 5.0
# alpha: 0.25
# gamma: 2
# use_presence: True # Change
# pos_focal: false
# pad_n_queries: 200
# pad_scale_pos: 1.0
# loss_fn_semantic_seg: null
# scale_by_find_batch_size: ${scratch.scale_by_find_batch_size}
# NOTE: Loss to be used for training in case of segmentation
loss:
_target_: sam3.train.loss.sam3_loss.Sam3LossWrapper
matcher: ${scratch.matcher}
o2m_weight: 2.0
o2m_matcher:
_target_: sam3.train.matcher.BinaryOneToManyMatcher
alpha: 0.3
threshold: 0.4
topk: 4
use_o2m_matcher_on_o2m_aux: false
loss_fns_find:
- _target_: sam3.train.loss.loss_fns.Boxes
weight_dict:
loss_bbox: 5.0
loss_giou: 2.0
- _target_: sam3.train.loss.loss_fns.IABCEMdetr
weak_loss: False
weight_dict:
loss_ce: 20.0 # Another option is 100.0
presence_loss: 20.0
pos_weight: 10.0 # Another option is 5.0
alpha: 0.25
gamma: 2
use_presence: True # Change
pos_focal: false
pad_n_queries: 200
pad_scale_pos: 1.0
- _target_: sam3.train.loss.loss_fns.Masks
focal_alpha: 0.25
focal_gamma: 2.0
weight_dict:
loss_mask: 200.0
loss_dice: 10.0
compute_aux: false
loss_fn_semantic_seg:
_target_: sam3.train.loss.loss_fns.SemanticSegCriterion
presence_head: True
presence_loss: False # Change
focal: True
focal_alpha: 0.6
focal_gamma: 2.0
downsample: False
weight_dict:
loss_semantic_seg: 20.0
loss_semantic_presence: 1.0
loss_semantic_dice: 30.0
scale_by_find_batch_size: ${scratch.scale_by_find_batch_size}
# ============================================================================
# Different helper parameters and functions
# ============================================================================
scratch:
enable_segmentation: True # NOTE: This is the number of queries used for segmentation
# Model parameters
d_model: 256
pos_embed:
_target_: sam3.model.position_encoding.PositionEmbeddingSine
num_pos_feats: ${scratch.d_model}
normalize: true
scale: null
temperature: 10000
# Box processing
use_presence_eval: True
original_box_postprocessor:
_target_: sam3.eval.postprocessors.PostProcessImage
max_dets_per_img: -1 # infinite detections
use_original_ids: true
use_original_sizes_box: true
use_presence: ${scratch.use_presence_eval}
# Matcher configuration
matcher:
_target_: sam3.train.matcher.BinaryHungarianMatcherV2
focal: true # with `focal: true` it is equivalent to BinaryFocalHungarianMatcher
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
alpha: 0.25
gamma: 2
stable: False
scale_by_find_batch_size: True
# Image processing parameters
resolution: 1008
consistent_transform: False
max_ann_per_img: 200
# Normalization parameters
train_norm_mean: [0.5, 0.5, 0.5]
train_norm_std: [0.5, 0.5, 0.5]
val_norm_mean: [0.5, 0.5, 0.5]
val_norm_std: [0.5, 0.5, 0.5]
# Training parameters
num_train_workers: 4
num_val_workers: 0
max_data_epochs: 20
target_epoch_size: 1500
hybrid_repeats: 1
context_length: 2
gather_pred_via_filesys: false
# Learning rate and scheduler parameters
lr_scale: 0.1
lr_transformer: ${times:8e-4,${scratch.lr_scale}}
lr_vision_backbone: ${times:2.5e-4,${scratch.lr_scale}}
lr_language_backbone: ${times:5e-5,${scratch.lr_scale}}
lrd_vision_backbone: 0.9
wd: 0.1
scheduler_timescale: 20
scheduler_warmup: 20
scheduler_cooldown: 20
val_batch_size: 1
collate_fn_val:
_target_: sam3.train.data.collator.collate_fn_api
_partial_: true
repeats: ${scratch.hybrid_repeats}
dict_key: roboflow100
with_seg_masks: ${scratch.enable_segmentation} # Note: Set this to true if using segmentation masks!
gradient_accumulation_steps: 1
train_batch_size: 16
collate_fn:
_target_: sam3.train.data.collator.collate_fn_api
_partial_: true
repeats: ${scratch.hybrid_repeats}
dict_key: all
with_seg_masks: ${scratch.enable_segmentation} # Note: Set this to true if using segmentation masks!
# ============================================================================
# Trainer Configuration
# ============================================================================
trainer:
# checkpoint:
# model_weight_initializer:
# # This is the wrapper class, NOT the utility function
# _target_: sam3.train.model_weight_initializer.ModelWeightInitializer
# # It usually takes 'path' or 'weights_path', not 'path_list'
# path: "/kaggle/working/sam3_checkpoints/sam3.pt"
# save_dir: "/kaggle/working/sam3_checkpoints/sam3.pt"
# save_freq: 1 # 0 only last checkpoint is saved.
_target_: sam3.train.trainer.Trainer
skip_saving_ckpts: False
empty_gpu_mem_cache_after_eval: True
skip_first_val: True
max_epochs: 20
accelerator: cuda
seed_value: 123
val_epoch_freq: 10
mode: train
gradient_accumulation_steps: ${scratch.gradient_accumulation_steps}
distributed:
backend: nccl
find_unused_parameters: True
gradient_as_bucket_view: True
loss:
all: ${roboflow_train.loss}
default:
_target_: sam3.train.loss.sam3_loss.DummyLoss
data:
train:
_target_: sam3.train.data.torch_dataset.TorchDataset
dataset:
_target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
limit_ids: ${roboflow_train.num_images}
transforms: ${roboflow_train.train_transforms}
load_segmentation: ${scratch.enable_segmentation}
# coco_json_loader:
# _target_: sam3.train.data.coco_json_loaders.COCO_FROM_JSON
# include_negatives: true
# category_chunk_size: 2 # Note: You can increase this based on the memory of your GPU.
# _partial_: true
max_ann_per_img: 500000
multiplier: 1
max_train_queries: 50000
max_val_queries: 50000
training: true
use_caching: False
img_folder: ${paths.roboflow_vl_100_root}/${roboflow_train.supercategory}/train/
ann_file: ${paths.roboflow_vl_100_root}/${roboflow_train.supercategory}/train/_annotations.coco.json #_annotations.coco.json
shuffle: True
batch_size: ${scratch.train_batch_size}
num_workers: ${scratch.num_train_workers}
pin_memory: True
drop_last: True
collate_fn: ${scratch.collate_fn}
val:
_target_: sam3.train.data.torch_dataset.TorchDataset
dataset:
_target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
load_segmentation: ${scratch.enable_segmentation}
coco_json_loader:
_target_: sam3.train.data.coco_json_loaders.COCO_FROM_JSON
include_negatives: true
category_chunk_size: 2 # Note: You can increase this based on the memory of your GPU.
_partial_: true
img_folder: ${paths.roboflow_vl_100_root}/${roboflow_train.supercategory}/test/
ann_file: ${paths.roboflow_vl_100_root}/${roboflow_train.supercategory}/test/_annotations.coco.json
transforms: ${roboflow_train.val_transforms}
max_ann_per_img: 100000
multiplier: 1
training: false
shuffle: False
batch_size: ${scratch.val_batch_size}
num_workers: ${scratch.num_val_workers}
pin_memory: True
drop_last: False
collate_fn: ${scratch.collate_fn_val}
model:
_target_: sam3.model_builder.build_sam3_image_model
bpe_path: ${paths.bpe_path}
device: cpus
eval_mode: false
enable_segmentation: ${scratch.enable_segmentation} # Warning: Enable this if using segmentation.
meters:
val:
roboflow100:
detection:
_target_: sam3.eval.coco_writer.PredictionDumper
iou_type: "bbox"
dump_dir: ${launcher.experiment_log_dir}/dumps/roboflow/${roboflow_train.supercategory}
merge_predictions: True
postprocessor: ${scratch.original_box_postprocessor}
gather_pred_via_filesys: ${scratch.gather_pred_via_filesys}
maxdets: 100
pred_file_evaluators:
- _target_: sam3.eval.coco_eval_offline.CocoEvaluatorOfflineWithPredFileEvaluators
gt_path: ${paths.roboflow_vl_100_root}/${roboflow_train.supercategory}/test/_annotations.coco.json
tide: False
iou_type: "bbox"
optim:
amp:
enabled: True
amp_dtype: bfloat16
optimizer:
_target_: torch.optim.AdamW
gradient_clip:
_target_: sam3.train.optim.optimizer.GradientClipper
max_norm: 0.1
norm_type: 2
param_group_modifiers:
- _target_: sam3.train.optim.optimizer.layer_decay_param_modifier
_partial_: True
layer_decay_value: ${scratch.lrd_vision_backbone}
apply_to: 'backbone.vision_backbone.trunk'
overrides:
- pattern: '*pos_embed*'
value: 1.0
options:
lr:
- scheduler: # transformer and class_embed
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: ${scratch.lr_transformer}
timescale: ${scratch.scheduler_timescale}
warmup_steps: ${scratch.scheduler_warmup}
cooldown_steps: ${scratch.scheduler_cooldown}
- scheduler:
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: ${scratch.lr_vision_backbone}
timescale: ${scratch.scheduler_timescale}
warmup_steps: ${scratch.scheduler_warmup}
cooldown_steps: ${scratch.scheduler_cooldown}
param_names:
- 'backbone.vision_backbone.*'
- scheduler:
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: ${scratch.lr_language_backbone}
timescale: ${scratch.scheduler_timescale}
warmup_steps: ${scratch.scheduler_warmup}
cooldown_steps: ${scratch.scheduler_cooldown}
param_names:
- 'backbone.language_backbone.*'
weight_decay:
- scheduler:
_target_: fvcore.common.param_scheduler.ConstantParamScheduler
value: ${scratch.wd}
- scheduler:
_target_: fvcore.common.param_scheduler.ConstantParamScheduler
value: 0.0
param_names:
- '*bias*'
module_cls_names: ['torch.nn.LayerNorm']
checkpoint:
save_dir: ${launcher.experiment_log_dir}/checkpoints
save_freq: 1 # 0 only last checkpoint is saved.
logging:
tensorboard_writer:
_target_: sam3.train.utils.logger.make_tensorboard_logger
log_dir: ${launcher.experiment_log_dir}/tensorboard
flush_secs: 120
should_log: True
wandb_writer: null
log_dir: ${launcher.experiment_log_dir}/logs/${roboflow_train.supercategory}
log_freq: 10
# ============================================================================
# Launcher and Submitit Configuration
# ============================================================================
launcher:
num_nodes: 1
gpus_per_node: 2
experiment_log_dir: ${paths.experiment_log_dir}
multiprocessing_context: forkserver
submitit:
account: null
partition: null
qos: null
timeout_hour: 72
use_cluster: False
cpus_per_task: 10
port_range: [10000, 65000]
constraint: null
# Uncomment for job array configuration
job_array:
num_tasks: 1
task_index: 0
# ============================================================================
# Available Roboflow Supercategories (for reference)
# ============================================================================
all_roboflow_supercategories:
- dentalai
The train log is as follows:
INFO 2025-12-19 10:13:47,482 trainer.py:1031: Estimated time remaining: 00d 00h 00m
INFO 2025-12-19 10:13:47,483 trainer.py: 973: Synchronizing meters
INFO 2025-12-19 10:13:47,483 trainer.py: 890: Losses and meters: {'Losses/train_all_loss': 0, 'Losses/train_default_loss': 0, 'Trainer/where': 0.0, 'Trainer/epoch': 0, 'Trainer/steps_train': 0}
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
INFO 2025-12-19 10:14:31,844 trainer.py:1031: Estimated time remaining: 00d 00h 00m
INFO 2025-12-19 10:14:31,846 trainer.py: 973: Synchronizing meters
INFO 2025-12-19 10:14:31,846 trainer.py: 890: Losses and meters: {'Losses/train_all_loss': 0, 'Losses/train_default_loss': 0, 'Trainer/where': 0.0, 'Trainer/epoch': 1, 'Trainer/steps_train': 0}
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
INFO 2025-12-19 10:15:17,941 trainer.py:1031: Estimated time remaining: 00d 00h 00m
INFO 2025-12-19 10:15:17,942 trainer.py: 973: Synchronizing meters
INFO 2025-12-19 10:15:17,942 trainer.py: 890: Losses and meters: {'Losses/train_all_loss': 0, 'Losses/train_default_loss': 0, 'Trainer/where': 0.0, 'Trainer/epoch': 2, 'Trainer/steps_train': 0}
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
INFO 2025-12-19 10:15:57,348 trainer.py:1031: Estimated time remaining: 00d 00h 00m
INFO 2025-12-19 10:15:57,349 trainer.py: 973: Synchronizing meters
INFO 2025-12-19 10:15:57,349 trainer.py: 890: Losses and meters: {'Losses/train_all_loss': 0, 'Losses/train_default_loss': 0, 'Trainer/where': 0.0, 'Trainer/epoch': 3, 'Trainer/steps_train': 0}
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
INFO 2025-12-19 10:16:44,979 trainer.py:1031: Estimated time remaining: 00d 00h 00m
INFO 2025-12-19 10:16:44,979 trainer.py: 973: Synchronizing meters
INFO 2025-12-19 10:16:44,980 trainer.py: 890: Losses and meters: {'Losses/train_all_loss': 0, 'Losses/train_default_loss': 0, 'Trainer/where': 0.0, 'Trainer/epoch': 4, 'Trainer/steps_train': 0}
[rank0]:[W1219 10:16:46.088064293 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank1]:[W1219 10:16:48.438422002 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=69, addr=[localhost]:58610, remote=[localhost]:50826): failed to recv, got 0 bytes
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7b8b77d785e8 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5ba8afe (0x7b8b60c2dafe in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x5baae40 (0x7b8b60c2fe40 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x5bab74a (0x7b8b60c3074a in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7b8b60c2a1a9 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7b8b1de239a9 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xdc253 (0x7b8b0ddd8253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: <unknown function> + 0x94ac3 (0x7b8b79829ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: clone + 0x44 (0x7b8b798baa04 in /lib/x86_64-linux-gnu/libc.so.6)
[rank1]:[W1219 10:16:48.454251717 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0(default_pg) Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
W1219 10:16:48.863000 241 torch/multiprocessing/spawn.py:169] Terminating process 252 via signal SIGTERM
Traceback (most recent call last):
File "/kaggle/working/sam3/sam3/train/train.py", line 339, in <module>
main(args)
File "/kaggle/working/sam3/sam3/train/train.py", line 310, in main
single_node_runner(cfg, main_port)
File "/kaggle/working/sam3/sam3/train/train.py", line 78, in single_node_runner
mp_runner(single_proc_run, args=args, nprocs=num_proc, start_method="spawn")
File "/usr/local/lib/python3.11/dist-packages/torch/multiprocessing/spawn.py", line 296, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/multiprocessing/spawn.py", line 215, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 965, in save
_save(
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1266, in _save
zip_file.write_record(name, storage, num_bytes)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
fn(i, *args)
File "/kaggle/working/sam3/sam3/train/train.py", line 58, in single_proc_run
trainer.run()
File "/kaggle/working/sam3/sam3/train/trainer.py", line 567, in run
self.run_train()
File "/kaggle/working/sam3/sam3/train/trainer.py", line 600, in run_train
self.save_checkpoint(self.epoch + 1)
File "/kaggle/working/sam3/sam3/train/trainer.py", line 379, in save_checkpoint
self._save_checkpoint(checkpoint, checkpoint_path)
File "/kaggle/working/sam3/sam3/train/trainer.py", line 392, in _save_checkpoint
torch.save(checkpoint, f)
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 964, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 818, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:659] . unexpected pos 433058240 vs 433058128
The final error is because the dirve is almost full.
There seems that my model does not receive the masks. I have uncommented the segmentation loss, and there is a discrepancy between the roboflow's coco,json file and the way coco_json_loaders.py script treats the bounding boxes. In the downloaded dataset the bboxes are stored like:
[x, y, w, h]
and the mentioned script treats them as:
[x1, y1, x2, y2]
I have changed the following lines in coco_json_loaders.py:
normalized_boxes = convert_boxlist_to_normalized_tensor(
[ann["bbox"]], width, height
)
to:
raw_bbox = ann["bbox"] # [x, y, w, h]
xyxy_bbox = [raw_bbox[0], raw_bbox[1], raw_bbox[0] + raw_bbox[2], raw_bbox[1] + raw_bbox[3]]
normalized_boxes = convert_boxlist_to_normalized_tensor([xyxy_bbox], width, height)
But this did not solve the problem. And the commented lines regarding the segmentation loss in the configuration file also contained error:
loss_fn_semantic_seg:
_target_: sam3.losses.loss_fns.SemanticSegCriterion
which there exists no losses directory. Changed it to:
loss_fn_semantic_seg:
_target_: sam3.train.loss.loss_fns.SemanticSegCriterion
But still the same result.
I very much appreciate your solutions.