Training sample from custom dataset
I could'nt manage to train on a custom dataset, many parts in the code in the sample training call external dataset. Is it possible to have a sample training code on custom datasets, using utils_load_dataset didn't work for the training case. The embedding is for what I've understood a clip encoded list of strings using their tokeniser. But much of this is hard to setup. The idea would be to have a simple, possible to train on custom dataset, training sample, it's something truely missing in many repositories.
Thanks for the work you've done !
Hi - what dataset are you trying to use? For the text embedding you can use the get_text_encodings function and the images can just be resized to the appropriate size and saved as an numpy file.
thanks @apapiu for your answer the main issue I have is with 16_16_latent_embeddings.npy I have no idea how to reproduce this kind of file, Not sure how to transform images to 'latent embedding'. I have a folder of images.png and images.txt ... , I don't know how to convert this to a latent embedding, my attempt until then was to create a dataset that return a numpy array of the imahe a,d called get _ text encodings for the text in getitem without success.