PanopticSegForLargeScalePointCloud icon indicating copy to clipboard operation
PanopticSegForLargeScalePointCloud copied to clipboard

About meanshift time consumption

Open cama36 opened this issue 2 years ago • 4 comments

Hello, I noticed that when using the embedding features branch, you used the meanshift clustering algorithm, but after my testing, this part of the clustering algorithm usually accounts for more than 80% of the entire process. I noticed that you also provided meanshift. GPU version, but it seems that it does not work properly. Do you have a better way to reduce this part of the time consumption?Thanks!

cama36 avatar Sep 25 '23 03:09 cama36

Hello, I noticed that when using the embedding features branch, you used the meanshift clustering algorithm, but after my testing, this part of the clustering algorithm usually accounts for more than 80% of the entire process. I noticed that you also provided meanshift. GPU version, but it seems that it does not work properly. Do you have a better way to reduce this part of the time consumption?Thanks!

Hi! I think I used the GPU version of meanshift, and what do you mean "not work properly"? Yes, the clustering step is time-consuming... Do you use your own dataset or ours? If for your own dataset, reduce the input radius of cylinders or use smaller voxel size would reduce the time.

bxiang233 avatar Oct 02 '23 11:10 bxiang233

Hello, I noticed that when using the embedding features branch, you used the meanshift clustering algorithm, but after my testing, this part of the clustering algorithm usually accounts for more than 80% of the entire process. I noticed that you also provided meanshift. GPU version, but it seems that it does not work properly. Do you have a better way to reduce this part of the time consumption?Thanks!

Hi! I think I used the GPU version of meanshift, and what do you mean "not work properly"? Yes, the clustering step is time-consuming... Do you use your own dataset or ours? If for your own dataset, reduce the input radius of cylinders or use smaller voxel size would reduce the time.

I'm having the same issue with MeanShift slowing the training loop to impractically slow. The first 30 epochs take around 12 minutes each, I've been running the 31st epoch for over 6 hours now and it's still in batch 41/750.

Hardware:

RAM: 64 GB
CPU: AMD Ryzen 5 5600X 6-Core Processor
GPU: NVIDIA GeForce RTX 3060 12GB VRAM

I'm running treeins training task with the default settings found in the repository

python train.py task=panoptic data=panoptic/treeins_rad8 models=panoptic/area4_ablation_3heads_5 \
    model_name=PointGroup-PAPER training=treeins job_name=test_run_3heads

In your reply, you mention that GPU version of meanshift is used, but in meanshift_cluster. cluster_single line 79 and further, the tensors are explicitly sent to the CPU. The MeanShift used comes from sklearn which I believe has no GPU support as far as I'm aware. Is this GPU version of meanshift defined somewhere else in the repository? I'd appreciate any guidance you can provide.

kasparas-k avatar May 14 '24 13:05 kasparas-k

Hello, I noticed that when using the embedding features branch, you used the meanshift clustering algorithm, but after my testing, this part of the clustering algorithm usually accounts for more than 80% of the entire process. I noticed that you also provided meanshift. GPU version, but it seems that it does not work properly. Do you have a better way to reduce this part of the time consumption?Thanks!

Hi! I think I used the GPU version of meanshift, and what do you mean "not work properly"? Yes, the clustering step is time-consuming... Do you use your own dataset or ours? If for your own dataset, reduce the input radius of cylinders or use smaller voxel size would reduce the time.

I'm having the same issue with MeanShift slowing the training loop to impractically slow. The first 30 epochs take around 12 minutes each, I've been running the 31st epoch for over 6 hours now and it's still in batch 41/750.

Hardware:

RAM: 64 GB
CPU: AMD Ryzen 5 5600X 6-Core Processor
GPU: NVIDIA GeForce RTX 3060 12GB VRAM

I'm running treeins training task with the default settings found in the repository

python train.py task=panoptic data=panoptic/treeins_rad8 models=panoptic/area4_ablation_3heads_5 \
    model_name=PointGroup-PAPER training=treeins job_name=test_run_3heads

In your reply, you mention that GPU version of meanshift is used, but in meanshift_cluster. cluster_single line 79 and further, the tensors are explicitly sent to the CPU. The MeanShift used comes from sklearn which I believe has no GPU support as far as I'm aware. Is this GPU version of meanshift defined somewhere else in the repository? I'd appreciate any guidance you can provide.

Hi! Thank you for your interest! I tried to recall my distant memory... Basically, I set the maximum number of training epochs, and if training is to be fully completed, it would take about a week. You can also check your training curves generated by wandb.

Regarding the GPU version, I remember I tried this repo: https://github.com/masqm/Faster-Mean-Shift-Euc. Basically, you just need to call their clustering function.

However, I later used parallel CPU computation for acceleration because I found it to be faster. If you're interested, you can make a comparison yourself. I didn't pursue further acceleration and optimization strategies because it is no longer the focus of my current research. I also struggled with the long training times due to the lengthy clustering process. I believe there is still room for improvement. If you have better methods, feel free to share them! Thank you.

Best, Binbin

bxiang233 avatar May 15 '24 07:05 bxiang233

Hi! Thank you for your interest! I tried to recall my distant memory... Basically, I set the maximum number of training epochs, and if training is to be fully completed, it would take about a week. You can also check your training curves generated by wandb.

Regarding the GPU version, I remember I tried this repo: https://github.com/masqm/Faster-Mean-Shift-Euc. Basically, you just need to call their clustering function.

However, I later used parallel CPU computation for acceleration because I found it to be faster. If you're interested, you can make a comparison yourself. I didn't pursue further acceleration and optimization strategies because it is no longer the focus of my current research. I also struggled with the long training times due to the lengthy clustering process. I believe there is still room for improvement. If you have better methods, feel free to share them! Thank you.

Best, Binbin

Thank you so much for your quick response. I'll try the code you linked to, maybe it will work faster in my case.

kasparas-k avatar May 15 '24 14:05 kasparas-k