Joon comments

Results 14 comments of


                                            Joon

A problem about the weight λ of Lvlb

In my opinion, authors define L_{vlb} = L_0 + ... + L_T, not L_t. Thus, they may calculate the vlb loss with scale factor T (self.num_timestep).

clip_denoised value range.

In my opinion, when you predict the x_start in t \approx T, with a cosine noise schedule, bar alphas (cumprod alphas) have very small values compared to linear noise schedules....

Memory Leakage Problem?

When I didn't use the DistributedSampler in the dataloader, this problem became less severe. But this problem remained.

I followed this post (https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/ ) In sam3/train/data/coco_json_loaders.py, we can add "TorchSerializedList" and modify load_coco_and_group_by_image func. ```python class TorchSerializedList: """ Alternative implementation using torch.Tensor for spawn/forkserver mode. torch.Tensor can be...

Training Fails with AttributeError: 'list' object has no attribute 'popitem' on Custom COCO Dataset

I guess the problem might be "_target_: sam3.train.transforms.segmentation.DecodeRle". According to your json example, the segmentation is not Rle format. You can convert them into Rle format.

tutorials for finetuning sam3

Perhaps, You can manually provide the input points and input bboxes in COCO_FROM_JSON class of coco_json_loader.py (if you train the model using image.) See the "loadQueriesAndAnnotationsFromDatapoint" in COCO_FROM_JSON. Or, if...

How to fine-tune on my own image/video datasets?

I guess you can only fine-tune the detector and the shared backbone. According to the current implementation, it appears that training code for the tracker modules is not provided. You...

How to fine-tune on my own image/video datasets?

@svengoluza You can manually turn off the requires_grad setting in the Trainer where the model is instantiated.

How to fine-tune on my own image/video datasets?

@svengoluza For me, yes.

How to fine-tune on my own image/video datasets?

I'm sorry. I wrongly pre-processed my data. The wrong part was converting the data into COCO format. Thank you for the reply!