[STF] Stackable optimizations
Description
Experiments on top of the stackable PR (not to break code using the branch), do not merge.
closes
Checklist
- [ ] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test 1c42f84
/ok to test 2e8a1fa
🟨 CI finished in 31m 33s: Pass: 50%/26 | Total: 4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits: 53%/7194
-
🟨 cudax: Pass: 50%/26 | Total: 4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits: 53%/7194
🟨 cxx 🟩 Clang14 Pass: 100%/2 | Total: 31m 49s | Avg: 15m 54s | Max: 16m 19s | Hits: 50%/1208 🟩 Clang15 Pass: 100%/1 | Total: 17m 19s | Avg: 17m 19s | Max: 17m 19s | Hits: 50%/602 🟩 Clang16 Pass: 100%/1 | Total: 16m 17s | Avg: 16m 17s | Max: 16m 17s | Hits: 50%/602 🟩 Clang17 Pass: 100%/1 | Total: 17m 15s | Avg: 17m 15s | Max: 17m 15s | Hits: 50%/602 🟩 Clang18 Pass: 100%/1 | Total: 17m 06s | Avg: 17m 06s | Max: 17m 06s | Hits: 50%/602 🟨 Clang19 Pass: 75%/4 | Total: 56m 29s | Avg: 14m 07s | Max: 16m 24s | Hits: 50%/1806 🟥 GCC10 Pass: 0%/2 | Total: 11m 08s | Avg: 5m 34s | Max: 5m 34s 🟥 GCC11 Pass: 0%/1 | Total: 5m 19s | Avg: 5m 19s | Max: 5m 19s 🟥 GCC12 Pass: 0%/1 | Total: 5m 36s | Avg: 5m 36s | Max: 5m 36s 🟥 GCC13 Pass: 0%/8 | Total: 29m 14s | Avg: 3m 39s | Max: 5m 57s 🟩 MSVC14.39 Pass: 100%/1 | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s | Hits: 95%/286 🟩 MSVC14.42 Pass: 100%/1 | Total: 12m 00s | Avg: 12m 00s | Max: 12m 00s | Hits: 95%/286 🟩 NVHPC25.3 Pass: 100%/2 | Total: 1h 01m | Avg: 30m 58s | Max: 31m 33s | Hits: 47%/1200 🟨 cxx_family 🟨 Clang Pass: 90%/10 | Total: 2h 36m | Avg: 15m 37s | Max: 17m 19s | Hits: 50%/5422 🟥 GCC Pass: 0%/12 | Total: 51m 17s | Avg: 4m 16s | Max: 5m 57s 🟩 MSVC Pass: 100%/2 | Total: 23m 58s | Avg: 11m 59s | Max: 12m 00s | Hits: 95%/572 🟩 NVHPC Pass: 100%/2 | Total: 1h 01m | Avg: 30m 58s | Max: 31m 33s | Hits: 47%/1200 🟨 cudacxx_family 🟨 nvcc Pass: 50%/26 | Total: 4h 53m | Avg: 11m 17s | Max: 31m 33s | Hits: 53%/7194 🟨 cpu 🟨 amd64 Pass: 50%/22 | Total: 4h 16m | Avg: 11m 38s | Max: 31m 33s | Hits: 54%/5990 🟨 arm64 Pass: 50%/4 | Total: 37m 21s | Avg: 9m 20s | Max: 14m 24s | Hits: 50%/1204 🟨 ctk 🟨 12.0 Pass: 66%/3 | Total: 33m 02s | Avg: 11m 00s | Max: 15m 30s | Hits: 65%/890 🟨 12.8 Pass: 47%/23 | Total: 4h 20m | Avg: 11m 19s | Max: 31m 33s | Hits: 52%/6304 🟨 cudacxx 🟨 nvcc12.0 Pass: 66%/3 | Total: 33m 02s | Avg: 11m 00s | Max: 15m 30s | Hits: 65%/890 🟨 nvcc12.8 Pass: 47%/23 | Total: 4h 20m | Avg: 11m 19s | Max: 31m 33s | Hits: 52%/6304 🟨 gpu 🟥 h100 Pass: 0%/2 | Total: 4m 40s | Avg: 2m 20s | Max: 4m 40s 🟨 rtx2080 Pass: 54%/24 | Total: 4h 48m | Avg: 12m 01s | Max: 31m 33s | Hits: 53%/7194 🟨 jobs 🟨 Build Pass: 56%/23 | Total: 4h 41m | Avg: 12m 13s | Max: 31m 33s | Hits: 53%/7194 🟥 Test Pass: 0%/3 | Total: 12m 09s | Avg: 4m 03s | Max: 12m 09s 🟥 sm 🟥 90 Pass: 0%/3 | Total: 9m 15s | Avg: 3m 05s | Max: 4m 40s 🟥 90a Pass: 0%/1 | Total: 4m 37s | Avg: 4m 37s | Max: 4m 37s 🟨 std 🟨 17 Pass: 50%/4 | Total: 52m 55s | Avg: 13m 13s | Max: 30m 23s | Hits: 49%/1202 🟨 20 Pass: 50%/22 | Total: 4h 00m | Avg: 10m 55s | Max: 31m 33s | Hits: 54%/5992
👃 Inspect Changes
Modifications in project?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 26)
| # | Runner |
|---|---|
| 17 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |
/ok to test ba44f81cc
🟨 CI finished in 39m 08s: Pass: 88%/26 | Total: 4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits: 78%/13218
-
🟨 cudax: Pass: 88%/26 | Total: 4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits: 78%/13218
🔍 cpu: amd64 🔍 🔍 amd64 Pass: 86%/22 | Total: 4h 03m | Avg: 11m 04s | Max: 20m 27s | Hits: 78%/10810 🟩 arm64 Pass: 100%/4 | Total: 36m 13s | Avg: 9m 03s | Max: 13m 28s | Hits: 77%/2408 🔍 ctk: 12.8 🔍 🟩 12.0 Pass: 100%/3 | Total: 31m 17s | Avg: 10m 25s | Max: 15m 05s | Hits: 80%/1494 🔍 12.8 Pass: 86%/23 | Total: 4h 08m | Avg: 10m 48s | Max: 20m 27s | Hits: 78%/11724 🔍 cudacxx: nvcc12.8 🔍 🟩 nvcc12.0 Pass: 100%/3 | Total: 31m 17s | Avg: 10m 25s | Max: 15m 05s | Hits: 80%/1494 🔍 nvcc12.8 Pass: 86%/23 | Total: 4h 08m | Avg: 10m 48s | Max: 20m 27s | Hits: 78%/11724 🚨 jobs: Test 🚨 🟩 Build Pass: 100%/23 | Total: 3h 52m | Avg: 10m 05s | Max: 16m 07s | Hits: 78%/13218 🔥 Test Pass: 0%/3 | Total: 47m 26s | Avg: 15m 48s | Max: 20m 27s 🔍 sm: 90 🔍 🔍 90 Pass: 66%/3 | Total: 37m 19s | Avg: 12m 26s | Max: 14m 51s | Hits: 57%/1204 🟩 90a Pass: 100%/1 | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s | Hits: 56%/602 🔍 std: 20 🔍 🟩 17 Pass: 100%/4 | Total: 40m 06s | Avg: 10m 01s | Max: 12m 19s | Hits: 76%/2406 🔍 20 Pass: 86%/22 | Total: 3h 59m | Avg: 10m 53s | Max: 20m 27s | Hits: 79%/10812 🟨 cxx 🟩 Clang14 Pass: 100%/2 | Total: 10m 43s | Avg: 5m 21s | Max: 5m 28s | Hits: 97%/1208 🟩 Clang15 Pass: 100%/1 | Total: 5m 55s | Avg: 5m 55s | Max: 5m 55s | Hits: 97%/602 🟩 Clang16 Pass: 100%/1 | Total: 5m 46s | Avg: 5m 46s | Max: 5m 46s | Hits: 97%/602 🟩 Clang17 Pass: 100%/1 | Total: 5m 38s | Avg: 5m 38s | Max: 5m 38s | Hits: 97%/602 🟩 Clang18 Pass: 100%/1 | Total: 5m 33s | Avg: 5m 33s | Max: 5m 33s | Hits: 97%/602 🟨 Clang19 Pass: 75%/4 | Total: 28m 40s | Avg: 7m 10s | Max: 12m 08s | Hits: 97%/1806 🟩 GCC10 Pass: 100%/2 | Total: 29m 48s | Avg: 14m 54s | Max: 15m 05s | Hits: 57%/1208 🟩 GCC11 Pass: 100%/1 | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s | Hits: 56%/602 🟩 GCC12 Pass: 100%/1 | Total: 16m 00s | Avg: 16m 00s | Max: 16m 00s | Hits: 56%/602 🟨 GCC13 Pass: 75%/8 | Total: 1h 50m | Avg: 13m 51s | Max: 20m 27s | Hits: 57%/3612 🟩 MSVC14.39 Pass: 100%/1 | Total: 10m 44s | Avg: 10m 44s | Max: 10m 44s | Hits: 95%/286 🟩 MSVC14.42 Pass: 100%/1 | Total: 10m 30s | Avg: 10m 30s | Max: 10m 30s | Hits: 95%/286 🟩 NVHPC25.3 Pass: 100%/2 | Total: 24m 20s | Avg: 12m 10s | Max: 12m 49s | Hits: 94%/1200 🟨 cxx_family 🟨 Clang Pass: 90%/10 | Total: 1h 02m | Avg: 6m 13s | Max: 12m 08s | Hits: 97%/5422 🟨 GCC Pass: 83%/12 | Total: 2h 51m | Avg: 14m 19s | Max: 20m 27s | Hits: 57%/6024 🟩 MSVC Pass: 100%/2 | Total: 21m 14s | Avg: 10m 37s | Max: 10m 44s | Hits: 95%/572 🟩 NVHPC Pass: 100%/2 | Total: 24m 20s | Avg: 12m 10s | Max: 12m 49s | Hits: 94%/1200 🟨 cudacxx_family 🟨 nvcc Pass: 88%/26 | Total: 4h 39m | Avg: 10m 45s | Max: 20m 27s | Hits: 78%/13218 🟨 gpu 🟨 h100 Pass: 50%/2 | Total: 26m 15s | Avg: 13m 07s | Max: 14m 51s | Hits: 57%/602 🟨 rtx2080 Pass: 91%/24 | Total: 4h 13m | Avg: 10m 33s | Max: 20m 27s | Hits: 79%/12616
👃 Inspect Changes
Modifications in project?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 26)
| # | Runner |
|---|---|
| 17 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |
/ok to test bb8b735d5
/ok to test bb8b735d5
@caugonnet, there was an error processing your request: E2
See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/
/ok to test bb8b735d5
/ok to test bb8b735d5
@caugonnet, there was an error processing your request: E2
See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/
/ok to test
/ok to test
@caugonnet, there was an error processing your request: E1
See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/
/ok to test f50b3eb
🟨 CI finished in 32m 15s: Pass: 88%/26 | Total: 6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits: 52%/13241
-
🟨 cudax: Pass: 88%/26 | Total: 6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits: 52%/13241
🔍 cpu: amd64 🔍 🔍 amd64 Pass: 86%/22 | Total: 5h 44m | Avg: 15m 40s | Max: 31m 47s | Hits: 52%/10829 🟩 arm64 Pass: 100%/4 | Total: 57m 29s | Avg: 14m 22s | Max: 15m 23s | Hits: 50%/2412 🔍 ctk: 12.8 🔍 🟩 12.0 Pass: 100%/3 | Total: 41m 19s | Avg: 13m 46s | Max: 16m 13s | Hits: 59%/1497 🔍 12.8 Pass: 86%/23 | Total: 6h 01m | Avg: 15m 42s | Max: 31m 47s | Hits: 51%/11744 🔍 cudacxx: nvcc12.8 🔍 🟩 nvcc12.0 Pass: 100%/3 | Total: 41m 19s | Avg: 13m 46s | Max: 16m 13s | Hits: 59%/1497 🔍 nvcc12.8 Pass: 86%/23 | Total: 6h 01m | Avg: 15m 42s | Max: 31m 47s | Hits: 51%/11744 🚨 jobs: Test 🚨 🟩 Build Pass: 100%/23 | Total: 6h 18m | Avg: 16m 26s | Max: 31m 47s | Hits: 52%/13241 🔥 Test Pass: 0%/3 | Total: 24m 21s | Avg: 8m 07s | Max: 8m 37s 🔍 sm: 90 🔍 🔍 90 Pass: 66%/3 | Total: 33m 42s | Avg: 11m 14s | Max: 13m 33s | Hits: 50%/1206 🟩 90a Pass: 100%/1 | Total: 12m 42s | Avg: 12m 42s | Max: 12m 42s | Hits: 50%/603 🔍 std: 20 🔍 🟩 17 Pass: 100%/4 | Total: 1h 10m | Avg: 17m 34s | Max: 30m 21s | Hits: 49%/2410 🔍 20 Pass: 86%/22 | Total: 5h 32m | Avg: 15m 05s | Max: 31m 47s | Hits: 52%/10831 🟨 cxx 🟩 Clang14 Pass: 100%/2 | Total: 28m 38s | Avg: 14m 19s | Max: 14m 49s | Hits: 51%/1210 🟩 Clang15 Pass: 100%/1 | Total: 16m 16s | Avg: 16m 16s | Max: 16m 16s | Hits: 50%/603 🟩 Clang16 Pass: 100%/1 | Total: 17m 05s | Avg: 17m 05s | Max: 17m 05s | Hits: 50%/603 🟩 Clang17 Pass: 100%/1 | Total: 15m 40s | Avg: 15m 40s | Max: 15m 40s | Hits: 50%/603 🟩 Clang18 Pass: 100%/1 | Total: 16m 27s | Avg: 16m 27s | Max: 16m 27s | Hits: 50%/603 🟨 Clang19 Pass: 75%/4 | Total: 53m 18s | Avg: 13m 19s | Max: 17m 13s | Hits: 50%/1809 🟩 GCC10 Pass: 100%/2 | Total: 32m 08s | Avg: 16m 04s | Max: 16m 13s | Hits: 50%/1210 🟩 GCC11 Pass: 100%/1 | Total: 18m 08s | Avg: 18m 08s | Max: 18m 08s | Hits: 50%/603 🟩 GCC12 Pass: 100%/1 | Total: 19m 53s | Avg: 19m 53s | Max: 19m 53s | Hits: 50%/603 🟨 GCC13 Pass: 75%/8 | Total: 1h 41m | Avg: 12m 38s | Max: 16m 39s | Hits: 50%/3618 🟩 MSVC14.39 Pass: 100%/1 | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits: 95%/287 🟩 MSVC14.42 Pass: 100%/1 | Total: 10m 18s | Avg: 10m 18s | Max: 10m 18s | Hits: 95%/287 🟩 NVHPC25.3 Pass: 100%/2 | Total: 1h 02m | Avg: 31m 04s | Max: 31m 47s | Hits: 47%/1202 🟨 cxx_family 🟨 Clang Pass: 90%/10 | Total: 2h 27m | Avg: 14m 44s | Max: 17m 13s | Hits: 50%/5431 🟨 GCC Pass: 83%/12 | Total: 2h 51m | Avg: 14m 16s | Max: 19m 53s | Hits: 50%/6034 🟩 MSVC Pass: 100%/2 | Total: 21m 35s | Avg: 10m 47s | Max: 11m 17s | Hits: 95%/574 🟩 NVHPC Pass: 100%/2 | Total: 1h 02m | Avg: 31m 04s | Max: 31m 47s | Hits: 47%/1202 🟨 cudacxx_family 🟨 nvcc Pass: 88%/26 | Total: 6h 42m | Avg: 15m 28s | Max: 31m 47s | Hits: 52%/13241 🟨 gpu 🟨 h100 Pass: 50%/2 | Total: 21m 11s | Avg: 10m 35s | Max: 13m 33s | Hits: 50%/603 🟨 rtx2080 Pass: 91%/24 | Total: 6h 21m | Avg: 15m 53s | Max: 31m 47s | Hits: 52%/12638
👃 Inspect Changes
Modifications in project?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 26)
| # | Runner |
|---|---|
| 17 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |