DiverGen icon indicating copy to clipboard operation
DiverGen copied to clipboard

About visualization of data distributions on different sources

Open pILLOW-1 opened this issue 1 year ago • 2 comments

Hi, great work! I have two questions about visualization of data distributions on different sources in fig.1. Q1: Is the generated data visualized here learned from the corresponding data source? For example, in the first row, the data from stable diffusion is learned on LVIS train, right? Q2: Based on the idea that generative data can expand the data distribution that the model can learn, is it possible for a generative model trained solely on one domain to generate data from another domain(e.g., model trained on training set can generate data similar to data in testing set)? Looking forward to your answers! Uploading visualization.jpg…

pILLOW-1 avatar Sep 28 '24 08:09 pILLOW-1

Addition to Q1: In the second row, the data from DeepFloyd is learned on LVIS val?

pILLOW-1 avatar Sep 28 '24 08:09 pILLOW-1

Hi, @pILLOW-1.

Thank you for your interest in our work!

Regarding Q1: There is no "learned from the corresponding data source" relationship between the two data sources in each row. Each subplot represents the visualization of embeddings after dimension reduction of data from the respective source. For example, the LVIS train subplot shows the embeddings of all instances in LVIS train after dimension reduction, while the Stable Diffusion subplot shows the embeddings of the data generated by Stable Diffusion after dimension reduction.

Regarding Q2: In DiverGen, we did not retrain or fine-tune the pre-trained generative models. We only used the open-source pre-trained weights for data generation. However, we think your suggestion is intriguing, and we may explore it further if time allows.

Hope this clears up your confusion, and feel free to reach out if you have any more questions!

leaf1170124460 avatar Sep 30 '24 03:09 leaf1170124460