Allignment between PIRs from simulation and EnvDiff

Open barrydoooit opened this issue 1 year ago • 1 comments

Hi Authors, thanks for the fancy work and clear demonstration of your framework. Just curious for the input arguments for the environment diffusion module: Does it only take text prompt as input?

Specifically, according to your fig 5 in the paper, the EnvDiff generates a PIR that has the same spatial information as the RGB image. However, the EnvDiff is claimed to be a text-to-image model, which means the generated PIR will belong to a random scene rather than the object/scene/env generated by the ObjDiff. Then, how can this PIR be alligned with the PIR generated by the simulation, which is based on the output of ObjDiff?

Feb 05 '25 21:02 barrydoooit

Thanks for pointing that out. Back in 2023, the purpose of environmental diffusion was to generate a more realistic background noise, but the interaction between the human subject and the environment was ignored. This was due to the lack of diffusion models capable of generating 3D representations of both the human body and the environment — which is exactly one of the limitations you mentioned.

In 2025, I’ll be working on replacing the environmental PIR with the latest 3D indoor diffusion models. Hopefully, this will lead to more realistic interactions between the subject and the environment!

Jun 23 '25 18:06 Asixa