cccl icon indicating copy to clipboard operation
cccl copied to clipboard

[STF] Stackable optimizations

Open caugonnet opened this issue 9 months ago • 6 comments

Description

Experiments on top of the stackable PR (not to break code using the branch), do not merge.

closes

Checklist

  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

caugonnet avatar Apr 26 '25 08:04 caugonnet

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Apr 26 '25 08:04 copy-pr-bot[bot]

/ok to test 1c42f84

caugonnet avatar Apr 26 '25 08:04 caugonnet

/ok to test 2e8a1fa

caugonnet avatar Apr 26 '25 08:04 caugonnet

🟨 CI finished in 31m 33s: Pass: 50%/26 | Total: 4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits: 53%/7194
  • 🟨 cudax: Pass: 50%/26 | Total: 4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits: 53%/7194

    🟨 cxx
      🟩 Clang14            Pass: 100%/2   | Total: 31m 49s | Avg: 15m 54s | Max: 16m 19s | Hits:  50%/1208  
      🟩 Clang15            Pass: 100%/1   | Total: 17m 19s | Avg: 17m 19s | Max: 17m 19s | Hits:  50%/602   
      🟩 Clang16            Pass: 100%/1   | Total: 16m 17s | Avg: 16m 17s | Max: 16m 17s | Hits:  50%/602   
      🟩 Clang17            Pass: 100%/1   | Total: 17m 15s | Avg: 17m 15s | Max: 17m 15s | Hits:  50%/602   
      🟩 Clang18            Pass: 100%/1   | Total: 17m 06s | Avg: 17m 06s | Max: 17m 06s | Hits:  50%/602   
      🟨 Clang19            Pass:  75%/4   | Total: 56m 29s | Avg: 14m 07s | Max: 16m 24s | Hits:  50%/1806  
      🟥 GCC10              Pass:   0%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 34s
      🟥 GCC11              Pass:   0%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟥 GCC12              Pass:   0%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟥 GCC13              Pass:   0%/8   | Total: 29m 14s | Avg:  3m 39s | Max:  5m 57s
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s | Hits:  95%/286   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 12m 00s | Avg: 12m 00s | Max: 12m 00s | Hits:  95%/286   
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 58s | Max: 31m 33s | Hits:  47%/1200  
    🟨 cxx_family
      🟨 Clang              Pass:  90%/10  | Total:  2h 36m | Avg: 15m 37s | Max: 17m 19s | Hits:  50%/5422  
      🟥 GCC                Pass:   0%/12  | Total: 51m 17s | Avg:  4m 16s | Max:  5m 57s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 58s | Avg: 11m 59s | Max: 12m 00s | Hits:  95%/572   
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 58s | Max: 31m 33s | Hits:  47%/1200  
    🟨 cudacxx_family
      🟨 nvcc               Pass:  50%/26  | Total:  4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits:  53%/7194  
    🟨 cpu
      🟨 amd64              Pass:  50%/22  | Total:  4h 16m | Avg: 11m 38s | Max: 31m 33s | Hits:  54%/5990  
      🟨 arm64              Pass:  50%/4   | Total: 37m 21s | Avg:  9m 20s | Max: 14m 24s | Hits:  50%/1204  
    🟨 ctk
      🟨 12.0               Pass:  66%/3   | Total: 33m 02s | Avg: 11m 00s | Max: 15m 30s | Hits:  65%/890   
      🟨 12.8               Pass:  47%/23  | Total:  4h 20m | Avg: 11m 19s | Max: 31m 33s | Hits:  52%/6304  
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  66%/3   | Total: 33m 02s | Avg: 11m 00s | Max: 15m 30s | Hits:  65%/890   
      🟨 nvcc12.8           Pass:  47%/23  | Total:  4h 20m | Avg: 11m 19s | Max: 31m 33s | Hits:  52%/6304  
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  4m 40s | Avg:  2m 20s | Max:  4m 40s
      🟨 rtx2080            Pass:  54%/24  | Total:  4h 48m | Avg: 12m 01s | Max: 31m 33s | Hits:  53%/7194  
    🟨 jobs
      🟨 Build              Pass:  56%/23  | Total:  4h 41m | Avg: 12m 13s | Max: 31m 33s | Hits:  53%/7194  
      🟥 Test               Pass:   0%/3   | Total: 12m 09s | Avg:  4m 03s | Max: 12m 09s
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total:  9m 15s | Avg:  3m 05s | Max:  4m 40s
      🟥 90a                Pass:   0%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟨 std
      🟨 17                 Pass:  50%/4   | Total: 52m 55s | Avg: 13m 13s | Max: 30m 23s | Hits:  49%/1202  
      🟨 20                 Pass:  50%/22  | Total:  4h 00m | Avg: 10m 55s | Max: 31m 33s | Hits:  54%/5992  
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 26)

# Runner
17 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

github-actions[bot] avatar Apr 26 '25 09:04 github-actions[bot]

/ok to test ba44f81cc

caugonnet avatar Apr 26 '25 09:04 caugonnet

🟨 CI finished in 39m 08s: Pass: 88%/26 | Total: 4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits: 78%/13218
  • 🟨 cudax: Pass: 88%/26 | Total: 4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits: 78%/13218

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  86%/22  | Total:  4h 03m | Avg: 11m 04s | Max: 20m 27s | Hits:  78%/10810 
      🟩 arm64              Pass: 100%/4   | Total: 36m 13s | Avg:  9m 03s | Max: 13m 28s | Hits:  77%/2408  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/3   | Total: 31m 17s | Avg: 10m 25s | Max: 15m 05s | Hits:  80%/1494  
      🔍 12.8               Pass:  86%/23  | Total:  4h 08m | Avg: 10m 48s | Max: 20m 27s | Hits:  78%/11724 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 nvcc12.0           Pass: 100%/3   | Total: 31m 17s | Avg: 10m 25s | Max: 15m 05s | Hits:  80%/1494  
      🔍 nvcc12.8           Pass:  86%/23  | Total:  4h 08m | Avg: 10m 48s | Max: 20m 27s | Hits:  78%/11724 
    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/23  | Total:  3h 52m | Avg: 10m 05s | Max: 16m 07s | Hits:  78%/13218 
      🔥 Test               Pass:   0%/3   | Total: 47m 26s | Avg: 15m 48s | Max: 20m 27s
    🔍 sm: 90 🔍
      🔍 90                 Pass:  66%/3   | Total: 37m 19s | Avg: 12m 26s | Max: 14m 51s | Hits:  57%/1204  
      🟩 90a                Pass: 100%/1   | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s | Hits:  56%/602   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total: 40m 06s | Avg: 10m 01s | Max: 12m 19s | Hits:  76%/2406  
      🔍 20                 Pass:  86%/22  | Total:  3h 59m | Avg: 10m 53s | Max: 20m 27s | Hits:  79%/10812 
    🟨 cxx
      🟩 Clang14            Pass: 100%/2   | Total: 10m 43s | Avg:  5m 21s | Max:  5m 28s | Hits:  97%/1208  
      🟩 Clang15            Pass: 100%/1   | Total:  5m 55s | Avg:  5m 55s | Max:  5m 55s | Hits:  97%/602   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s | Hits:  97%/602   
      🟩 Clang17            Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s | Hits:  97%/602   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s | Hits:  97%/602   
      🟨 Clang19            Pass:  75%/4   | Total: 28m 40s | Avg:  7m 10s | Max: 12m 08s | Hits:  97%/1806  
      🟩 GCC10              Pass: 100%/2   | Total: 29m 48s | Avg: 14m 54s | Max: 15m 05s | Hits:  57%/1208  
      🟩 GCC11              Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s | Hits:  56%/602   
      🟩 GCC12              Pass: 100%/1   | Total: 16m 00s | Avg: 16m 00s | Max: 16m 00s | Hits:  56%/602   
      🟨 GCC13              Pass:  75%/8   | Total:  1h 50m | Avg: 13m 51s | Max: 20m 27s | Hits:  57%/3612  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 44s | Avg: 10m 44s | Max: 10m 44s | Hits:  95%/286   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 30s | Avg: 10m 30s | Max: 10m 30s | Hits:  95%/286   
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 24m 20s | Avg: 12m 10s | Max: 12m 49s | Hits:  94%/1200  
    🟨 cxx_family
      🟨 Clang              Pass:  90%/10  | Total:  1h 02m | Avg:  6m 13s | Max: 12m 08s | Hits:  97%/5422  
      🟨 GCC                Pass:  83%/12  | Total:  2h 51m | Avg: 14m 19s | Max: 20m 27s | Hits:  57%/6024  
      🟩 MSVC               Pass: 100%/2   | Total: 21m 14s | Avg: 10m 37s | Max: 10m 44s | Hits:  95%/572   
      🟩 NVHPC              Pass: 100%/2   | Total: 24m 20s | Avg: 12m 10s | Max: 12m 49s | Hits:  94%/1200  
    🟨 cudacxx_family
      🟨 nvcc               Pass:  88%/26  | Total:  4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits:  78%/13218 
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 26m 15s | Avg: 13m 07s | Max: 14m 51s | Hits:  57%/602   
      🟨 rtx2080            Pass:  91%/24  | Total:  4h 13m | Avg: 10m 33s | Max: 20m 27s | Hits:  79%/12616 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 26)

# Runner
17 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

github-actions[bot] avatar Apr 26 '25 10:04 github-actions[bot]

/ok to test bb8b735d5

caugonnet avatar May 11 '25 20:05 caugonnet

/ok to test bb8b735d5

@caugonnet, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

copy-pr-bot[bot] avatar May 11 '25 20:05 copy-pr-bot[bot]

/ok to test bb8b735d5

caugonnet avatar May 11 '25 20:05 caugonnet

/ok to test bb8b735d5

@caugonnet, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

copy-pr-bot[bot] avatar May 11 '25 20:05 copy-pr-bot[bot]

/ok to test

caugonnet avatar May 11 '25 20:05 caugonnet

/ok to test

@caugonnet, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

copy-pr-bot[bot] avatar May 11 '25 20:05 copy-pr-bot[bot]

/ok to test f50b3eb

caugonnet avatar May 11 '25 20:05 caugonnet

🟨 CI finished in 32m 15s: Pass: 88%/26 | Total: 6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits: 52%/13241
  • 🟨 cudax: Pass: 88%/26 | Total: 6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits: 52%/13241

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  86%/22  | Total:  5h 44m | Avg: 15m 40s | Max: 31m 47s | Hits:  52%/10829 
      🟩 arm64              Pass: 100%/4   | Total: 57m 29s | Avg: 14m 22s | Max: 15m 23s | Hits:  50%/2412  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/3   | Total: 41m 19s | Avg: 13m 46s | Max: 16m 13s | Hits:  59%/1497  
      🔍 12.8               Pass:  86%/23  | Total:  6h 01m | Avg: 15m 42s | Max: 31m 47s | Hits:  51%/11744 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 nvcc12.0           Pass: 100%/3   | Total: 41m 19s | Avg: 13m 46s | Max: 16m 13s | Hits:  59%/1497  
      🔍 nvcc12.8           Pass:  86%/23  | Total:  6h 01m | Avg: 15m 42s | Max: 31m 47s | Hits:  51%/11744 
    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/23  | Total:  6h 18m | Avg: 16m 26s | Max: 31m 47s | Hits:  52%/13241 
      🔥 Test               Pass:   0%/3   | Total: 24m 21s | Avg:  8m 07s | Max:  8m 37s
    🔍 sm: 90 🔍
      🔍 90                 Pass:  66%/3   | Total: 33m 42s | Avg: 11m 14s | Max: 13m 33s | Hits:  50%/1206  
      🟩 90a                Pass: 100%/1   | Total: 12m 42s | Avg: 12m 42s | Max: 12m 42s | Hits:  50%/603   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total:  1h 10m | Avg: 17m 34s | Max: 30m 21s | Hits:  49%/2410  
      🔍 20                 Pass:  86%/22  | Total:  5h 32m | Avg: 15m 05s | Max: 31m 47s | Hits:  52%/10831 
    🟨 cxx
      🟩 Clang14            Pass: 100%/2   | Total: 28m 38s | Avg: 14m 19s | Max: 14m 49s | Hits:  51%/1210  
      🟩 Clang15            Pass: 100%/1   | Total: 16m 16s | Avg: 16m 16s | Max: 16m 16s | Hits:  50%/603   
      🟩 Clang16            Pass: 100%/1   | Total: 17m 05s | Avg: 17m 05s | Max: 17m 05s | Hits:  50%/603   
      🟩 Clang17            Pass: 100%/1   | Total: 15m 40s | Avg: 15m 40s | Max: 15m 40s | Hits:  50%/603   
      🟩 Clang18            Pass: 100%/1   | Total: 16m 27s | Avg: 16m 27s | Max: 16m 27s | Hits:  50%/603   
      🟨 Clang19            Pass:  75%/4   | Total: 53m 18s | Avg: 13m 19s | Max: 17m 13s | Hits:  50%/1809  
      🟩 GCC10              Pass: 100%/2   | Total: 32m 08s | Avg: 16m 04s | Max: 16m 13s | Hits:  50%/1210  
      🟩 GCC11              Pass: 100%/1   | Total: 18m 08s | Avg: 18m 08s | Max: 18m 08s | Hits:  50%/603   
      🟩 GCC12              Pass: 100%/1   | Total: 19m 53s | Avg: 19m 53s | Max: 19m 53s | Hits:  50%/603   
      🟨 GCC13              Pass:  75%/8   | Total:  1h 41m | Avg: 12m 38s | Max: 16m 39s | Hits:  50%/3618  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits:  95%/287   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 18s | Avg: 10m 18s | Max: 10m 18s | Hits:  95%/287   
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  1h 02m | Avg: 31m 04s | Max: 31m 47s | Hits:  47%/1202  
    🟨 cxx_family
      🟨 Clang              Pass:  90%/10  | Total:  2h 27m | Avg: 14m 44s | Max: 17m 13s | Hits:  50%/5431  
      🟨 GCC                Pass:  83%/12  | Total:  2h 51m | Avg: 14m 16s | Max: 19m 53s | Hits:  50%/6034  
      🟩 MSVC               Pass: 100%/2   | Total: 21m 35s | Avg: 10m 47s | Max: 11m 17s | Hits:  95%/574   
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 04s | Max: 31m 47s | Hits:  47%/1202  
    🟨 cudacxx_family
      🟨 nvcc               Pass:  88%/26  | Total:  6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits:  52%/13241 
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 13m 33s | Hits:  50%/603   
      🟨 rtx2080            Pass:  91%/24  | Total:  6h 21m | Avg: 15m 53s | Max: 31m 47s | Hits:  52%/12638 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 26)

# Runner
17 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

github-actions[bot] avatar May 11 '25 21:05 github-actions[bot]