ml-agents Preprocessing Image from camera, or getting other value from camera

Is your feature request related to a problem? Please describe. I want to take other information beside images from camera, and maybe preprocess them before inputting them to ml models (crop rotate or anything else)

Describe the solution you'd like For the image information I think a segmentation or depth map would be nice https://bitbucket.org/Unity-Technologies/ml-imagesynthesis/src/master/ I tried this repository, and it work. How can I combine this as input for ml-agents model? where should i start?

Describe alternatives you've considered https://bitbucket.org/Unity-Technologies/ml-imagesynthesis/src/master/

Additional context Add any other context or screenshots about the feature request here.

Sep 21 '21 09:09 darwinharianto

Hi @darwinharianto I do not see how cropping or rotating the images would be useful. You can always just move the camera around if needed but cropping and rotating is a data augmentation technique that I have not seen being used in RL. If think this falls outside of the scope of ml-agents. Segmentation and depth perception are currently logged as task MLA-32, we do not know when we will have it available but it is possible today to use a custom shader for rendering the cameras. So if you already have a shader you want to use that does segmentation, you should be able to use that.

Sep 21 '21 17:09 vincentpierre

Hi @vincentpierre Thanks for the reply ah, yes cropping or rotating isn't really my target, but it is close to it. I would like to perform GAN on the image. something like this https://www.youtube.com/watch?v=P1IcaBn3ej0

I would like to take the segmentation map, then inputting them into GAN model, which I hope will give me realistic image.

Segmentation and depth perception are currently logged as task MLA-32,

for the time being I will try to use https://bitbucket.org/Unity-Technologies/ml-imagesynthesis/src/master/ I could use the repo above, which gives me segmentation map and else but I don't know where to parse the image and preprocess them before sending them to ml agents

Sep 22 '21 01:09 darwinharianto

Reading your comment about augmentation on images made me do some digging on that particular part. It seem some people tried to use augmentation on images in gym or atari environment. Augmentation seems to give better result.

https://arxiv.org/abs/2004.14990 https://openreview.net/pdf?id=GY6-6sTvGaf

Sep 22 '21 08:09 darwinharianto

Hi @darwinharianto Very cool stuff. This is the visual encoder we use : https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/torch/encoders.py#L248 I think the simplest would maybe be to insert a pretrained GAN model here. Regarding data augmentation, maybe duplicating and processing trajectories should be done in the agent_processor script. I am not sure where it would happen in the C# code, maybe by using the RenderTexture sensor and processing the image between updates.

Sep 22 '21 17:09 vincentpierre

Thanks for the reply @vincentpierre

https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/torch/encoders.py#L248

This is created as a block multiple times at ResNetVisualEncoder. I assume that means I have to add pretrained GAN model at forward method for every xxVisualEncoder (ResNetVisualEncoder, NatureVisualEncoder, SimpleVisualEncoder, etc)? Since I only want GAN to run one time on the input. Is this right?

I thought that I just have to get the image data before it is forwarded to the model, but I can't find where it is Something like

image_data, other_data = divide_observed_state(observed_state)
image_data = perform_gan_and_augmentation(image_data)
observed_state = combine_state(image_data, other_data)
model.forward(observed_state)

Sep 24 '21 00:09 darwinharianto

You would only need to modify the encoder you want to use, no need to modify all of them. If you looks at the forward method of the encoder, it has a "input_tensor" argument that is the image. If you need some other data, you will need to modify this api or do the sampling of the "other_data" in the encoder.

Sep 24 '21 19:09 vincentpierre

Thanks, modifying resnetblock gives me what I want. But the resizing method that is used by camera broke the segmentation map, is there a way to make it not interpolate the image?

Sep 27 '21 06:09 darwinharianto

Sorry, can you tell me where did the resnetblock got initiated and data being passed? I can't follow what to do if I want to make a new resnet block with a new tapped "other_data".

Sep 28 '21 01:09 darwinharianto

The ResNets are created with create_input_processors and is called by initializing a ObservationEncoder. The ObservationEncoder is part of the NetworkBody and is the core network of both Actors and Critic in all of our algorithms. For example, in the SimpleActor of the main Policy, the SimpleActor uses the NetworkBody here.

I hope this helps.

Sep 28 '21 18:09 vincentpierre

Thank you for opening up this issue! I'm also trying to use the ImageSynthesis tool to obtain semantic segmentation images for my ml-agent but have faced the same problem. Did you figure out how to do that at last? @darwinharianto May I have your suggestions as well? @vincentpierre Thank you so much!

Aug 16 '22 00:08 pengzhi1998

@pengzhi1998 I ended changing encoders.py and add some preprocess in the forward method inside ResNetVisualEncoder there.

To forward the image from ml-imagesynthesis, I dont really remember, but adding camera sensor and making the ImageSynthesis to output its pass to camera 0 should be sufficient SetupCameraWithReplacementShader(capturePasses[0].camera, uberReplacementShader, ReplacelementModes.CatergoryId);

Aug 17 '22 01:08 darwinharianto

Really appreciate your help! I'll firstly try on the code you provided!

Aug 17 '22 06:08 pengzhi1998