InstantMesh Removal of logos/brand names

I tried to generate a 3d model of a hoodie and there were holes at places where there's a logo. Tried it on a couple different examples and same result, most of the time. Is this intentional? Is there a way via script to by pass this?

May 17 '24 22:05 aditdesai

Type I solemnly swear that I am up to no good at the beginning of file name.

Well, jokes aside, I bet logo strongly differs from the rest of hoodie. So, model expects it to be a hole or eye or some sort of relief. So, there are two ways for you: either train InstantMesh to ignore logos or use StableDiffusion to inpaint cloth over logos. Of course, with SD you'll need to either mask the inpaint zone (if it's common for all images) or use ControlNet\UnCLIP to automatically locate and mask logos for every image.

May 27 '24 09:05 iiiCpu

@iiiCpu Is there any possibility to improve InstantMesh output objects by training them? If so, how would you train?

May 29 '24 09:05 cavargas10

@iiiCpu Is there any possibility to improve InstantMesh output objects by training them? If so, how would you train?

Note that i'm not from the developer team. Neither I have experience training exactly this model.

First, InstantMesh launches Zero123++ model to generate initial images of object from different angles. Then InstantMesh uses this images to generate a cloud of points. Finally, it unites this cloud into a final mesh to use in your common 3D engine.

So, first, you need to find out, which step produces an error.

If Zero123++ is the source (i.e. multiview image has visible defects), you'll need to train it using 3D model of a hoodie. Better to be many different models. Or, at least, one model with different poses and textures. And descriptions on Objaverse syntax. Then you'll need to use this config for training, just edit it so it would use your dataset instead of Objaverse.
It is possible to train Zero123 purely with images of the object, but it's quite tricky. You'll need many images of the same object on the white background, angles of camera and distanced from the object. Then you'll need to generate description like in this script
If InstantMesh is the source, you'll need multiple views of the object and it's 3D-model. Use this config to train the model.

Either way, you'll need beefy GPU to train the model. As it feels itself not quite comfortable on 10Gb 3080 RTX during generation, one might expect to need at least 24 Gb 3090 RTX to have a chance for successful training. The more the better.

Oh, you might also try and change base model from Zero123++ to Zero123xl or Stable-Zero123. But as they are slightly different from each other, you'll need to adjust the code base. Or you may just sit here waiting for @TencentARC to release newer version with this support built-in.

May 29 '24 17:05 iiiCpu