ml-aim
ml-aim copied to clipboard
This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
Thank you so much for sharing your great work! However, it would be super useful if the training codes and procedures could be shared as well. This will be helpful...
Hi, thanks for the great work. Is it in the works to release the training code for this model? I think there is a great opportunity to SSL finetune these...
Hi there. Thanks for the great work. In the paper, you mentioned that ***"We also consider a cross-entropy loss with patches converted to discrete tokens using an offline tokenizer"***. Recently,...
AIM-600M: ``` def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM: preprocessor, trunk, head = _aim( img_size=img_size, patch_size=14, embed_dim=1536, num_blocks=24, num_heads=12, **kwargs, ) return AIM(preprocessor, trunk, head) ```...
I was wondering if you could upload the weights of the final layers used to minimise the NLL, so that this model could be used as an image probability model.
I would like to do a text-image search. Does AIMv2 have a text encoder like what the CLIP and SigLIP(2) have? Thanks a lot.
Hi, First of all thanks for sharing this incredible project! Regarding the implementation of the generative pretraining, I would like to ask for some clarifications. Your paper is very detailed...
Hi, wanted to do some quick tests using `aimv2-large-patch14-224-lit` with the MLX backend but can't seem to find any convenience function to encode text in the repo? Digging through the...
When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original...
Would there be a 3b native resolution weights?