ml-aim issues

Releasing Training Codes

14

Thank you so much for sharing your great work! However, it would be super useful if the training codes and procedures could be shared as well. This will be helpful...

eslambakr

finetuning checkpoints on dataset from another domain

1

Hi, thanks for the great work. Is it in the works to release the training code for this model? I think there is a great opportunity to SSL finetune these...

aloyspx

what "offline tokenizers" have you tried?

Hi there. Thanks for the great work. In the paper, you mentioned that ***"We also consider a cross-entropy loss with patches converted to discrete tokens using an offline tokenizer"***. Recently,...

xiabingquan

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE

AIM-600M: ``` def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM: preprocessor, trunk, head = _aim( img_size=img_size, patch_size=14, embed_dim=1536, num_blocks=24, num_heads=12, **kwargs, ) return AIM(preprocessor, trunk, head) ```...

TonyLianLong

Using as an image probability model

I was wondering if you could upload the weights of the final layers used to minimise the NLL, so that this model could be used as an image probability model.

alexhepburn

Does AIMv2 have a text encoder like the CLIP and SigLIP(2) have?

3

I would like to do a text-image search. Does AIMv2 have a text encoder like what the CLIP and SigLIP(2) have? Thanks a lot.

wjtan99

Implementation Details

4

Hi, First of all thanks for sharing this incredible project! Regarding the implementation of the generative pretraining, I would like to ask for some clarifications. Your paper is very detailed...

nasosger

MLX example?

Hi, wanted to do some quick tests using `aimv2-large-patch14-224-lit` with the MLX backend but can't seem to find any convenience function to encode text in the repo? Digging through the...

maxlund

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN.

When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original...

shengyuwoo

Would there be a 3b native resolution weights?

MonolithFoundation

ml-aim
ml-aim copied to clipboard

Metadata

Releasing Training Codes

finetuning checkpoints on dataset from another domain

what "offline tokenizers" have you tried?

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE

Using as an image probability model

Does AIMv2 have a text encoder like the CLIP and SigLIP(2) have?

Implementation Details

MLX example?

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN.

Would there be a 3b native resolution weights?

← Metadata

Owner

Metadata

ml-aim ml-aim copied to clipboard

Metadata

← Metadata

Owner

Metadata

ml-aim
ml-aim copied to clipboard