Training with data generated by Isaac Sim Replicator
Hi, first of all, sorry for the long post. I'm currently experimenting with training DOPE to understand the project fully and its theoretical background. First, I generated synthetic data using Blenderproc since NVISII is no longer supported by newer Python versions. However, I've missed the point where I should be running the below command 5 times as mentioned here: https://github.com/NVlabs/Deep_Object_Pose/issues/361#issue-2322934867, data generation took so much time and I ended up only having ~1600 images after a long time.
./run_blenderproc_datagen.py --nb_runs 5 --nb_frames 1000 --path_single_obj /isaac_dope_retrain/Popcorn/google_16k/textured.obj --nb_objects 6 --distractors_folder ../nvisii_data_gen/google_scanned_models --nb_distractors 10 --backgrounds_folder ../dome_hdri_haven/
Then I followed NVIDIA's official documentation on generating synthetic dataset using Isaac Sim Replicator and generated ~7200 images for the 036_wood_block class which is from the YCB dataset. Then I separated 202 images for testing and run debug tool on this split like the following:
python3 debug.py --data ../isaac_data/test/
Here is the debug result:
My first question is, do these debug results look correct for the object? (looks like there is no connection to the 8th point)
Then I started DOPE training with the train/train.py script with a batch size of 16 and for 200 epochs:
python3 -m torch.distributed.launch --nproc_per_node=1 train.py --batchsize 16 -e 200 --data ../isaac_data/isaac_data/ --object _36_wood_block
Here is my training log, TensorBoard, and belief maps so far:
Loading Model...
ready to train!
Train Epoch: 1 [0/7000 (0%)] Loss: 0.032431039959192 Local Rank: 0
Train Epoch: 1 [1600/7000 (23%)] Loss: 0.005280021112412 Local Rank: 0
Train Epoch: 1 [3200/7000 (46%)] Loss: 0.003795965109020 Local Rank: 0
Train Epoch: 1 [4800/7000 (68%)] Loss: 0.005852747708559 Local Rank: 0
Train Epoch: 1 [6400/7000 (91%)] Loss: 0.005139055196196 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 2 [0/7000 (0%)] Loss: 0.005350864026695 Local Rank: 0
Train Epoch: 2 [1600/7000 (23%)] Loss: 0.004227774683386 Local Rank: 0
Train Epoch: 2 [3200/7000 (46%)] Loss: 0.005079013295472 Local Rank: 0
Train Epoch: 2 [4800/7000 (68%)] Loss: 0.005090656690300 Local Rank: 0
Train Epoch: 2 [6400/7000 (91%)] Loss: 0.004829996265471 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 3 [0/7000 (0%)] Loss: 0.004584960173815 Local Rank: 0
Train Epoch: 3 [1600/7000 (23%)] Loss: 0.005996774882078 Local Rank: 0
Train Epoch: 3 [3200/7000 (46%)] Loss: 0.005501705221832 Local Rank: 0
Train Epoch: 3 [4800/7000 (68%)] Loss: 0.004319729283452 Local Rank: 0
Train Epoch: 3 [6400/7000 (91%)] Loss: 0.003838054370135 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 4 [0/7000 (0%)] Loss: 0.004935909062624 Local Rank: 0
Train Epoch: 4 [1600/7000 (23%)] Loss: 0.005632530432194 Local Rank: 0
Train Epoch: 4 [3200/7000 (46%)] Loss: 0.003991267178208 Local Rank: 0
Train Epoch: 4 [4800/7000 (68%)] Loss: 0.005610936786979 Local Rank: 0
Train Epoch: 4 [6400/7000 (91%)] Loss: 0.005505917128175 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 5 [0/7000 (0%)] Loss: 0.004688881803304 Local Rank: 0
Train Epoch: 5 [1600/7000 (23%)] Loss: 0.004800515249372 Local Rank: 0
Train Epoch: 5 [3200/7000 (46%)] Loss: 0.004549864679575 Local Rank: 0
Train Epoch: 5 [4800/7000 (68%)] Loss: 0.004274855367839 Local Rank: 0
Train Epoch: 5 [6400/7000 (91%)] Loss: 0.004714360460639 Local Rank: 0
.
.
.
NaN or Inf found in input tensor.
Train Epoch: 35 [0/7000 (0%)] Loss: 0.003139317035675 Local Rank: 0
Train Epoch: 35 [1600/7000 (23%)] Loss: 0.003259100019932 Local Rank: 0
Train Epoch: 35 [3200/7000 (46%)] Loss: 0.003899197326973 Local Rank: 0
Train Epoch: 35 [4800/7000 (68%)] Loss: 0.003065461292863 Local Rank: 0
Train Epoch: 35 [6400/7000 (91%)] Loss: 0.002900267252699 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 36 [0/7000 (0%)] Loss: 0.002934988122433 Local Rank: 0
Train Epoch: 36 [1600/7000 (23%)] Loss: 0.002790168859065 Local Rank: 0
Train Epoch: 36 [3200/7000 (46%)] Loss: 0.002845479408279 Local Rank: 0
Train Epoch: 36 [4800/7000 (68%)] Loss: 0.003665541065857 Local Rank: 0
Train Epoch: 36 [6400/7000 (91%)] Loss: 0.003706640098244 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 37 [0/7000 (0%)] Loss: 0.003936767578125 Local Rank: 0
Train Epoch: 37 [1600/7000 (23%)] Loss: 0.003334288951010 Local Rank: 0
Train Epoch: 37 [3200/7000 (46%)] Loss: 0.003531043883413 Local Rank: 0
Train Epoch: 37 [4800/7000 (68%)] Loss: 0.003663829993457 Local Rank: 0
Train Epoch: 37 [6400/7000 (91%)] Loss: 0.003290888853371 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 38 [0/7000 (0%)] Loss: 0.004055183846503 Local Rank: 0
Train Epoch: 38 [1600/7000 (23%)] Loss: 0.003013886744156 Local Rank: 0
Train Epoch: 38 [3200/7000 (46%)] Loss: 0.003433866892010 Local Rank: 0
Train Epoch: 38 [4800/7000 (68%)] Loss: 0.003090128302574 Local Rank: 0
Train Epoch: 38 [6400/7000 (91%)] Loss: 0.003046812955290 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 39 [0/7000 (0%)] Loss: 0.003284086473286 Local Rank: 0
Train Epoch: 39 [1600/7000 (23%)] Loss: 0.003326341975480 Local Rank: 0
Train Epoch: 39 [3200/7000 (46%)] Loss: 0.003423219081014 Local Rank: 0
Train Epoch: 39 [4800/7000 (68%)] Loss: 0.002622122410685 Local Rank: 0
Train Epoch: 39 [6400/7000 (91%)] Loss: 0.003288988722488 Local Rank: 0
NaN or Inf found in input tensor.
Train Epoch: 40 [0/7000 (0%)] Loss: 0.003445768961683 Local Rank: 0
Train Epoch: 40 [1600/7000 (23%)] Loss: 0.002639355603606 Local Rank: 0
Train Epoch: 40 [3200/7000 (46%)] Loss: 0.002910927170888 Local Rank: 0
Train Epoch: 40 [4800/7000 (68%)] Loss: 0.003899353090674 Local Rank: 0
Train Epoch: 40 [6400/7000 (91%)] Loss: 0.003307382576168 Local Rank: 0
My second question is, do these loss values/tensorboard/belief maps indicate a good training process? (I'm asking because training really takes time on RTX 4070 Ti and if something is wrong, I would like to fix it and restart the training)
My third question is; I have noticed some differences between NVISII/Blenderproc data and Isaac Sim Replicator data, so these changes would affect my training or inference process? Are there any other steps that should apply to Isaac data before or after the training to get the expected results?
Sample data for cracker:
{
"camera_data": {},
"objects": [
{
"class": "cracker",
"visibility": 1.0,
"location": [
6.001536846160889,
-4.993739604949951,
86.97349548339844
],
"quaternion_xyzw": [
0.05684935301542282,
0.32983407378196716,
-0.32128748297691345,
0.8858622908592224
],
"projected_cuboid": [
[
294.026123046875,
258.42864990234375
],
[
407.443115234375,
274.3418884277344
],
[
363.9100646972656,
112.13758850097656
],
[
240.95428466796875,
112.85871887207031
],
[
264.0873718261719,
293.51092529296875
],
[
379.4239501953125,
314.3736877441406
],
[
329.1734924316406,
149.2497100830078
],
[
204.38331604003906,
145.52359008789062
],
[
309.0063171386719,
211.89468383789062
]
]
}
]
}
Wood block data from Isaac Sim Replicator:
{
"camera_data": {
"intrinsics": {
"fx": 768.0,
"fy": 768.0,
"cx": 256.0,
"cy": 256.0
},
"camera_view_matrix": [
[
1.0,
0.0,
0.0,
0.0
],
[
0.0,
-1.0,
-0.0,
0.0
],
[
0.0,
0.0,
-1.0,
0.0
],
[
-0.0,
0.0,
0.0,
1.0
]
],
"camera_projection_matrix": [
[
3.0,
0.0,
0.0,
0.0
],
[
0.0,
3.0,
0.0,
0.0
],
[
0.0,
0.0,
0.0,
-1.0
],
[
0.0,
0.0,
0.01,
0.0
]
]
},
"keypoint_order": [
"Center",
"LDB",
"LDF",
"LUB",
"LUF",
"RDB",
"RDF",
"RUB",
"RUF"
],
"objects": [
{
"prim_path": "/World/_36_wood_block_0/_36_wood_block",
"visibility": 0.973,
"location": [
-0.02757798135280609,
0.12349912524223328,
1.191649079322815
],
"quaternion_xyzw": [
0.2647941019297759,
0.9311522685504322,
-0.13284147100297986,
0.21258570022076217
],
"projected_cuboid": [
[
238,
336
],
[
308,
370
],
[
286,
383
],
[
233,
258
],
[
217,
277
],
[
262,
402
],
[
243,
413
],
[
186,
284
],
[
172,
303
]
],
"truncation_ratio": 0.0,
"class": "_36_wood_block"
}
]
}
Thanks in advance.
@nv-jeff maybe jeff could help you, I never used isaac replicator sorry.
@TontonTremblay Thanks for your answer, I would be really appreciated if @nv-jeff could also chime in. Meantime, would you have a comment on the ongoing training (loss/maps etc.)?
I think you would want 0.001 around loss, not sure would need to check other issue threads. Also maybe your object has symmetries in there. Check the section on generating data with symmetries.
ahh you are doing the ycb wood box, yeah there are symmetries on that object. You will have to define them.
I got you but this object is not generated by NVISII, so where am I supposed to define model_info.json? I just have .pngs and .jsons in my dataset. I don't see any option for it in the train.py either. Is this supposed be in Isaac Sim's data generation config? This is my dope_config.yml file within the Isaac Sim Replicator directory:
---
# Default rendering parameters
CONFIG:
renderer: RayTracedLighting
headless: false
width: 512
height: 512
# prim_type is determined by the usd file.
# To determine, open the usd file in Isaac Sim and see the prim path. If you load it in /World, the path will be /World/<prim_type>
OBJECTS_TO_GENERATE:
#- { part_name: 003_cracker_box, num: 1, prim_type: _03_cracker_box }
#- { part_name: 035_power_drill, num: 1, prim_type: _35_power_drill }
- { part_name: 036_wood_block, num: 1, prim_type: _36_wood_block }
# Maximum force component to apply to objects to keep them in motion
FORCE_RANGE: 30
# Camera Intrinsics
WIDTH: 512
HEIGHT: 512
F_X: 768
F_Y: 768
pixel_size: 0.04 # in mm
# Number of sphere lights added to the scene
NUM_LIGHTS: 6
# Minimum and maximum distances of objects away from the camera (along the optical axis)
# MIN_DISTANCE: 1.0
MIN_DISTANCE: 0.4
# MAX_DISTANCE: 2.0
MAX_DISTANCE: 1.4
# Rotation of camera rig with respect to world frame, expressed as XYZ euler angles
CAMERA_RIG_ROTATION:
- 0
- 0
- 0
# Rotation of camera with respect to camera rig, expressed as XYZ euler angles. Please note that in this example, we
# define poses with respect to the camera rig instead of the camera prim. By using the rig's frame as a surrogate for
# the camera's frame, we effectively change the coordinate system of the camera. When
# CAMERA_RIG_ROTATION = np.array([0, 0, 0]) and CAMERA_ROTATION = np.array([0, 0, 0]), this corresponds to the default
# Isaac-Sim camera coordinate system of -z out the face of the camera, +x to the right, and +y up. When
# CAMERA_RIG_ROTATION = np.array([0, 0, 0]) and CAMERA_ROTATION = np.array([180, 0, 0]), this corresponds to
# the YCB Video Dataset camera coordinate system of +z out the face of the camera, +x to the right, and +y down.
CAMERA_ROTATION:
- 180
- 0
- 0
# Minimum and maximum XYZ euler angles for the part being trained on to be rotated, with respect to the camera rig
MIN_ROTATION_RANGE:
- -180
- -90
- -180
# Minimum and maximum XYZ euler angles for the part being trained on to be rotated, with respect to the camera rig
MAX_ROTATION_RANGE:
- 180
- 90
- 180
# How close the center of the part being trained on is allowed to be to the edge of the screen
FRACTION_TO_SCREEN_EDGE: 0.9
# MESH and DOME datasets
SHAPE_SCALE:
- 0.05
- 0.05
- 0.05
SHAPE_MASS: 1
OBJECT_SCALE:
- 1
- 1
- 1
OBJECT_MASS: 1
TRAIN_PART_SCALE: # Scale for the training objects
- 1
- 1
- 1
# Asset paths
DISTRACTOR_ASSET_PATH: /Isaac/Props/YCB/Axis_Aligned/
TRAIN_ASSET_PATH: /Isaac/Props/YCB/Axis_Aligned/
DOME_TEXTURE_PATH: /NVIDIA/Assets/Skies/
# MESH dataset
NUM_MESH_SHAPES: 400
NUM_MESH_OBJECTS: 150
MESH_FRACTION_GLASS: 0.15
MESH_FILENAMES:
- 002_master_chef_can
- 004_sugar_box
- 005_tomato_soup_can
- 006_mustard_bottle
- 007_tuna_fish_can
- 008_pudding_box
- 009_gelatin_box
- 010_potted_meat_can
- 011_banana
- 019_pitcher_base
- 021_bleach_cleanser
- 024_bowl
- 025_mug
- 035_power_drill
- 036_wood_block
- 037_scissors
- 040_large_marker
- 051_large_clamp
- 052_extra_large_clamp
- 061_foam_brick
# DOME dataset
NUM_DOME_SHAPES: 30
NUM_DOME_OBJECTS: 20
DOME_FRACTION_GLASS: 0.2
DOME_TEXTURES:
- Clear/evening_road_01_4k
- Clear/kloppenheim_02_4k
- Clear/mealie_road_4k
- Clear/noon_grass_4k
- Clear/qwantani_4k
- Clear/signal_hill_sunrise_4k
- Clear/sunflowers_4k
- Clear/syferfontein_18d_clear_4k
- Clear/venice_sunset_4k
- Clear/white_cliff_top_4k
- Cloudy/abandoned_parking_4k
- Cloudy/champagne_castle_1_4k
- Cloudy/evening_road_01_4k
- Cloudy/kloofendal_48d_partly_cloudy_4k
- Cloudy/lakeside_4k
- Cloudy/sunflowers_4k
- Cloudy/table_mountain_1_4k
- Evening/evening_road_01_4k
- Indoor/adams_place_bridge_4k
- Indoor/autoshop_01_4k
- Indoor/bathroom_4k
- Indoor/carpentry_shop_01_4k
- Indoor/en_suite_4k
- Indoor/entrance_hall_4k
- Indoor/hospital_room_4k
- Indoor/hotel_room_4k
- Indoor/lebombo_4k
- Indoor/old_bus_depot_4k
- Indoor/small_empty_house_4k
- Indoor/studio_small_04_4k
- Indoor/surgery_4k
- Indoor/vulture_hide_4k
- Indoor/wooden_lounge_4k
- Night/kloppenheim_02_4k
- Night/moonlit_golf_4k
- Storm/approaching_storm_4k
@TontonTremblay @nv-jeff, my training just completed even though the loss value looks okay, I can not get any inference results with inference/inference.py. What would be the reason for it?
Training output:
TensorBoard:
Inference:
can you share belief maps i think it is show_beliefs something like that
Only thing I could find related to show_beliefs was --showbelief flag in train2/inference.py. I tried to run it like:
python3 inference.py --data ../isaac_data/test/ --showbelief
but it throws:
Traceback (most recent call last):
File "/isaac_dope_retrain/Deep_Object_Pose/train2/inference.py", line 22, in <module>
from cuboid import Cuboid3d
ModuleNotFoundError: No module named 'cuboid'
So I have changed
import sys
sys.path.append("inference")
to
import sys
sys.path.append("../common")
but this time it throws:
I also set the following fields in train2/config_inference/config_pose.yaml:
weights: {
'_36_wood_block':"/isaac_dope_retrain/Deep_Object_Pose/train/output_wood_block/weights/net_epoch_99.pth"
}
architectures: {
'_36_wood_block':"dope"
}
dimensions: {
"_36_wood_block": [9.6024150848388672,19.130100250244141,5.824894905090332]
}
class_ids: {
'_36_wood_block': 37
}
draw_colors: {
"_36_wood_block": [217,12, 232] # magenta
}
How can I get belief map images? Sorry, I could not find any explanation for this in the repo. Thank you for your support.
I see this part in the train/train.py but I can't find any images related to it. It's like it doesn't save those somehow:
I have a view in TensorBoard for the epoch 176. Would it be useful for you @TontonTremblay ?
yeah it looks like you have symmetries. see how for some points it does not know which one is which? you need to add them to your annotation. https://github.com/NVlabs/Deep_Object_Pose/blob/master/data_generation/readme.md#handling-objects-with-symmetries read this.
@TontonTremblay thanks for your answer. I wanted to use Ketchup as a reference point to see if I could get an expected result before your answer. I have generated ~3900 images using Blenderproc. What do you think about my belief maps while it is in ~165th epoch:
These look great to my eye.
On Wed, Aug 21, 2024 at 12:25 Doruk Sönmez @.***> wrote:
@TontonTremblay https://github.com/TontonTremblay thanks for your answer. I wanted to use Ketchup as a reference point to see if I could get an expected result before your answer. I have generated ~3900 images using Blenderproc. What do you think about my belief maps while it is in ~165th epoch: Screenshot.from.2024-08-21.22-08-44.png (view on web) https://github.com/user-attachments/assets/0fb7a4b5-a2cf-4893-87a3-c8ebfbdd88fc
— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/381#issuecomment-2302855464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIBBAQBS4BIZ6ATBLPLZSTSRPAVCNFSM6AAAAABMYTQLXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSHA2TKNBWGQ . You are receiving this because you were mentioned.Message ID: @.***>
I think I'm getting somewhere with this but my inference results are not good. I was able to run train2/inference.py with small modifications to the detector.py line 496 (which prevented running with an error):
#belief -= float(torch.min(belief)[0].data.cpu().numpy())
#belief /= float(torch.max(belief)[0].data.cpu().numpy())
belief -= float(torch.min(belief).data.cpu().numpy())
belief /= float(torch.max(belief).data.cpu().numpy())
Here is a couple belief maps from the actual train2/inference.py script and pose estimation results:
In some of the images, there is no result at all:
I've already decreased the threshold to 0.1 and still a maximum of 1 result appears on the output. What might be the issue here?
Can you try on an image with a single ketchup bottle? Also I think your cuboid keypoints order is looking strange. I forgot if you used Isaac sin or nvisii.
On Thu, Aug 22, 2024 at 07:33 Doruk Sönmez @.***> wrote:
I think I'm getting somewhere with this but my inference results are not good. I was able to run train2/inference.py with small modifications to the detector.py line 496 (which prevented running with an error):
#belief -= float(torch.min(belief)[0].data.cpu().numpy()) #belief /= float(torch.max(belief)[0].data.cpu().numpy())
belief -= float(torch.min(belief).data.cpu().numpy()) belief /= float(torch.max(belief).data.cpu().numpy())
Here is a couple belief maps from the actual train2/inference.py script and pose estimation results: file_12.png (view on web) https://github.com/user-attachments/assets/e5ae3d08-3a49-4647-b4b6-6e0cd1a2e0b0 file_12_belief.png (view on web) https://github.com/user-attachments/assets/d77f5e31-dd8f-4c50-8be3-a76462b6a010
file_25.png (view on web) https://github.com/user-attachments/assets/961621ad-d6ac-4908-81af-8b07afbed3ef file_25_belief.png (view on web) https://github.com/user-attachments/assets/a0a46dbe-640c-4659-989c-884fa2eb9ec4
In some of the images, there is no result at all: file_28.png (view on web) https://github.com/user-attachments/assets/361ac12d-91c7-4295-8a8a-c76c92389817 file_28_belief.png (view on web) https://github.com/user-attachments/assets/eb73dc97-8022-4523-9d21-64e6dffab7ae
What might be the issue?
— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/381#issuecomment-2304829180, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIFBXCWK5T74AJPUMZDZSXZC7AVCNFSM6AAAAABMYTQLXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBUHAZDSMJYGA . You are receiving this because you were mentioned.Message ID: @.***>
I used Blendproc for this Ketchup dataset. Here is the picture with one Ketchup bottle:
And the belief map:
And this one is without distractors:
Ok I see the problème. It is the order of the cuboid in blender proc that is different than nvisii. I am out this week. Ping nv Jeff so he can check into that or fix one or the other. Might be adding a flab from where the data comes from.
On Thu, Aug 22, 2024 at 09:52 Doruk Sönmez @.***> wrote:
And this one is without distractors: 000000.png (view on web) https://github.com/user-attachments/assets/8dfdb915-cd8b-4444-8a6d-0c6cf5521ed8
000000_belief.png (view on web) https://github.com/user-attachments/assets/b462f4ba-e810-4a49-aed8-09173ed2e53b
— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/381#issuecomment-2305217428, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIAZOL4JMOHH2NNJPP3ZSYJLLAVCNFSM6AAAAABMYTQLXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGIYTONBSHA . You are receiving this because you were mentioned.Message ID: @.***>
@TontonTremblay Okay, many thanks for clarifying. @nv-jeff could we please get your help for this issue? Thanks in advance.
Hey :) i think that i am experiencing the same issue, did you manage to generate the write order of cuboids points in issac sim?
Could really use some help on best practice data gen and training a bit lost
Thanks for the reminder, @Shahar-AstroScale . At the time when this issue was originally raised, I had no access to the paper's original datasets, and NVISII wasn't working; so this bug in the BlenderProc data generation code was not obvious. It has now been fixed, and data generated with BlenderProc should have the same cuboid point ordering as the NVISII code.