HMatrices.jl icon indicating copy to clipboard operation
HMatrices.jl copied to clipboard

Cholesky factorization

Open maltezfaria opened this issue 1 year ago • 2 comments

This PR adds support for an efficient representation and manipulation of Hermitian kernels. By wrapping a Kernel as a Hermitian type, we now automatically assemble only the upper triangular part of the HMatrix, in principle cutting the times by a factor of roughly two for assembling.

It also implements the Cholesky factorization for Hermitian hierarchical matrices.

If merged, should close #51

TODO

  • [ ] Documentation for Cholesky and Hermitian

Other modifications/improvements

  • [x] Simplify the logic behind wrappers of HMatrix (e.g. Adjoint, Hermitian).
  • [x] Use atomic operations to coordinate hmatrix-vector multiplication

maltezfaria avatar Apr 29 '24 14:04 maltezfaria

Benchmark result

Judge result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmarks:
    • Target: 29 Apr 2024 - 17:03
    • Baseline: 29 Apr 2024 - 17:06
  • Package commits:
    • Target: b086fe
    • Baseline: 3f0d3f
  • Julia commits:
    • Target: bd47ec
    • Baseline: bd47ec
  • Julia command flags:
    • Target: -O3
    • Baseline: -O3
  • Environment variables:
    • Target: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8
    • Baseline: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less than 1.0 denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds).

ID time ratio memory ratio
["Laplace vectorized", "LU threads=true"] 1.11 (5%) :x: 1.00 (1%)
["Laplace vectorized", "assemble threads=false"] 1.20 (5%) :x: 1.00 (1%)
["Laplace vectorized", "gemv threads=false"] 1.10 (5%) :x: 1.00 (1%)
["Laplace", "assemble threads=false"] 1.06 (5%) :x: 1.00 (1%)
["Laplace", "assemble threads=true"] 1.09 (5%) :x: 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Target

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40   828 MHz    1445881 s       9134 s     295338 s  1660616204 s          0 s
  Memory: 31.01314926147461 GB (24768.0078125 MB free)
  Uptime: 4.15657919e6 sec
  Load Avg:  2.01  1.06  0.46
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Baseline

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1448504 s       9134 s     295569 s  1660682988 s          0 s
  Memory: 31.01314926147461 GB (24794.9921875 MB free)
  Uptime: 4.15675335e6 sec
  Load Avg:  1.28  1.1  0.59
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Target result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmark: 29 Apr 2024 - 17:3
  • Package commit: b086fe
  • Julia commit: bd47ec
  • Julia command flags: -O3
  • Environment variables: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

ID time GC time memory allocations
["Compressors", "PartialACA(0.0, 9223372036854775807, 1.0e-6)"] 783.370 μs (5%) 582.45 KiB (1%) 24
["Compressors", "TSVD(0.0, 9223372036854775807, 1.0e-6)"] 622.933 ms (5%) 652.500 μs 46.04 MiB (1%) 16
["Laplace permuted", "assemble threads=false"] 3.700 s (5%) 92.588 ms 1.44 GiB (1%) 38992
["Laplace permuted", "assemble threads=true"] 458.752 ms (5%) 1.46 GiB (1%) 45691
["Laplace vectorized", "LU threads=false"] 33.185 s (5%) 480.963 ms 3.02 GiB (1%) 1348089
["Laplace vectorized", "LU threads=true"] 37.924 s (5%) 580.504 ms 3.12 GiB (1%) 3020456
["Laplace vectorized", "assemble threads=false"] 1.669 s (5%) 8.168 ms 1.44 GiB (1%) 38992
["Laplace vectorized", "assemble threads=true"] 316.066 ms (5%) 148.665 ms 1.46 GiB (1%) 45683
["Laplace vectorized", "gemv threads=false"] 163.210 ms (5%) 1.43 MiB (1%) 2236
["Laplace vectorized", "gemv threads=true"] 63.729 ms (5%) 5.05 MiB (1%) 11012
["Laplace", "assemble threads=false"] 4.286 s (5%) 80.390 ms 1.44 GiB (1%) 38992
["Laplace", "assemble threads=true"] 627.106 ms (5%) 1.46 GiB (1%) 45687

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40   828 MHz    1445881 s       9134 s     295338 s  1660616204 s          0 s
  Memory: 31.01314926147461 GB (24768.0078125 MB free)
  Uptime: 4.15657919e6 sec
  Load Avg:  2.01  1.06  0.46
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Baseline result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmark: 29 Apr 2024 - 17:6
  • Package commit: 3f0d3f
  • Julia commit: bd47ec
  • Julia command flags: -O3
  • Environment variables: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

ID time GC time memory allocations
["Compressors", "PartialACA(0.0, 9223372036854775807, 1.0e-6)"] 795.367 μs (5%) 582.45 KiB (1%) 24
["Compressors", "TSVD(0.0, 9223372036854775807, 1.0e-6)"] 638.151 ms (5%) 647.959 μs 46.04 MiB (1%) 16
["Laplace permuted", "assemble threads=false"] 3.556 s (5%) 88.011 ms 1.44 GiB (1%) 38992
["Laplace permuted", "assemble threads=true"] 481.423 ms (5%) 1.46 GiB (1%) 51078
["Laplace vectorized", "LU threads=false"] 31.938 s (5%) 290.628 ms 3.03 GiB (1%) 1828456
["Laplace vectorized", "LU threads=true"] 34.220 s (5%) 257.325 ms 3.13 GiB (1%) 3493942
["Laplace vectorized", "assemble threads=false"] 1.386 s (5%) 5.683 ms 1.44 GiB (1%) 38992
["Laplace vectorized", "assemble threads=true"] 305.698 ms (5%) 1.47 GiB (1%) 51088
["Laplace vectorized", "gemv threads=false"] 148.068 ms (5%) 1.43 MiB (1%) 2236
["Laplace vectorized", "gemv threads=true"] 62.605 ms (5%) 5.09 MiB (1%) 20847
["Laplace", "assemble threads=false"] 4.034 s (5%) 11.084 ms 1.44 GiB (1%) 38992
["Laplace", "assemble threads=true"] 576.986 ms (5%) 1.46 GiB (1%) 51076

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1448504 s       9134 s     295569 s  1660682988 s          0 s
  Memory: 31.01314926147461 GB (24794.9921875 MB free)
  Uptime: 4.15675335e6 sec
  Load Avg:  1.28  1.1  0.59
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Runtime information

Runtime Info
BLAS #threads 20
BLAS.vendor() lbt
Sys.CPU_THREADS 40

lscpu output:

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             40
On-line CPU(s) list:                0-39
Thread(s) per core:                 2
Core(s) per socket:                 10
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:                           4
CPU MHz:                            2200.000
CPU max MHz:                        3000,0000
CPU min MHz:                        800,0000
BogoMIPS:                           4400.00
Virtualization:                     VT-x
L1d cache:                          640 KiB
L1i cache:                          640 KiB
L2 cache:                           20 MiB
L3 cache:                           27,5 MiB
NUMA node0 CPU(s):                  0-9,20-29
NUMA node1 CPU(s):                  10-19,30-39
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities
Cpu Property Value
Brand Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 10 physical cores, 20 logical cores (on executing CPU)
Hyperthreading hardware capability detected
Clock Frequencies 2200 / 3000 MHz (base/max), 100 MHz bus
Data Cache Level 1:3 : (32, 1024, 14080) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC runs at constant rate (invariant from clock frequency)
Perf. Monitoring Performance Monitoring Counters (PMC) revision 4
Available hardware counters per logical core:
3 fixed-function counters of 48 bit width
4 general-purpose counters of 48 bit width
Hypervisor No

github-actions[bot] avatar Apr 29 '24 15:04 github-actions[bot]

Benchmark result

Judge result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmarks:
    • Target: 30 Apr 2024 - 19:48
    • Baseline: 30 Apr 2024 - 19:51
  • Package commits:
    • Target: 9b90fb
    • Baseline: 3f0d3f
  • Julia commits:
    • Target: 0b4590
    • Baseline: 0b4590
  • Julia command flags:
    • Target: -O3
    • Baseline: -O3
  • Environment variables:
    • Target: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8
    • Baseline: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less than 1.0 denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds).

ID time ratio memory ratio
["Compressors", "PartialACA(0.0, 9223372036854775807, 1.0e-6)"] 1.11 (5%) :x: 1.03 (1%) :x:
["Laplace permuted", "assemble threads=true"] 1.05 (5%) :x: 1.00 (1%)
["Laplace vectorized", "LU threads=true"] 1.12 (5%) :x: 1.00 (1%)
["Laplace vectorized", "assemble threads=true"] 0.87 (5%) :white_check_mark: 1.00 (1%)
["Laplace vectorized", "gemv threads=false"] 1.24 (5%) :x: 1.00 (1%)
["Laplace", "assemble threads=false"] 1.16 (5%) :x: 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Target

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1511508 s       9298 s     306692 s  1699052441 s          0 s
  Memory: 31.01314926147461 GB (24225.421875 MB free)
  Uptime: 4.25287935e6 sec
  Load Avg:  2.15  1.52  0.78
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Baseline

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1514458 s       9298 s     306908 s  1699126928 s          0 s
  Memory: 31.01314926147461 GB (24702.25390625 MB free)
  Uptime: 4.25307356e6 sec
  Load Avg:  1.79  1.49  0.91
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Target result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmark: 30 Apr 2024 - 19:48
  • Package commit: 9b90fb
  • Julia commit: 0b4590
  • Julia command flags: -O3
  • Environment variables: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

ID time GC time memory allocations
["Compressors", "PartialACA(0.0, 9223372036854775807, 1.0e-6)"] 928.260 μs (5%) 613.70 KiB (1%) 24
["Compressors", "TSVD(0.0, 9223372036854775807, 1.0e-6)"] 623.807 ms (5%) 648.373 μs 46.04 MiB (1%) 16
["Laplace permuted", "assemble threads=false"] 3.147 s (5%) 30.535 ms 1.44 GiB (1%) 38992
["Laplace permuted", "assemble threads=true"] 441.069 ms (5%) 1.46 GiB (1%) 45691
["Laplace vectorized", "LU threads=false"] 33.237 s (5%) 485.804 ms 3.02 GiB (1%) 1348087
["Laplace vectorized", "LU threads=true"] 37.594 s (5%) 565.013 ms 3.12 GiB (1%) 3020254
["Laplace vectorized", "assemble threads=false"] 1.498 s (5%) 6.014 ms 1.44 GiB (1%) 38992
["Laplace vectorized", "assemble threads=true"] 184.352 ms (5%) 1.46 GiB (1%) 45689
["Laplace vectorized", "gemv threads=false"] 181.095 ms (5%) 1.43 MiB (1%) 2236
["Laplace vectorized", "gemv threads=true"] 64.323 ms (5%) 5.05 MiB (1%) 11012
["Laplace", "assemble threads=false"] 4.033 s (5%) 5.663 ms 1.44 GiB (1%) 38992
["Laplace", "assemble threads=true"] 522.534 ms (5%) 1.46 GiB (1%) 45691

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1511508 s       9298 s     306692 s  1699052441 s          0 s
  Memory: 31.01314926147461 GB (24225.421875 MB free)
  Uptime: 4.25287935e6 sec
  Load Avg:  2.15  1.52  0.78
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Baseline result

Benchmark Report for /home/lfaria/runner-hmatrices/_work/HMatrices.jl/HMatrices.jl

Job Properties

  • Time of benchmark: 30 Apr 2024 - 19:51
  • Package commit: 3f0d3f
  • Julia commit: 0b4590
  • Julia command flags: -O3
  • Environment variables: OPENBLAS_NUM_THREADS => 1 JULIA_NUM_THREADS => 8

Results

Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

ID time GC time memory allocations
["Compressors", "PartialACA(0.0, 9223372036854775807, 1.0e-6)"] 836.639 μs (5%) 598.08 KiB (1%) 24
["Compressors", "TSVD(0.0, 9223372036854775807, 1.0e-6)"] 635.452 ms (5%) 610.518 μs 46.04 MiB (1%) 16
["Laplace permuted", "assemble threads=false"] 3.115 s (5%) 9.120 ms 1.44 GiB (1%) 38992
["Laplace permuted", "assemble threads=true"] 419.269 ms (5%) 1.46 GiB (1%) 51060
["Laplace vectorized", "LU threads=false"] 31.928 s (5%) 288.194 ms 3.03 GiB (1%) 1828452
["Laplace vectorized", "LU threads=true"] 33.542 s (5%) 262.986 ms 3.13 GiB (1%) 3493709
["Laplace vectorized", "assemble threads=false"] 1.464 s (5%) 5.558 ms 1.44 GiB (1%) 38992
["Laplace vectorized", "assemble threads=true"] 211.380 ms (5%) 1.46 GiB (1%) 51066
["Laplace vectorized", "gemv threads=false"] 145.809 ms (5%) 1.43 MiB (1%) 2236
["Laplace vectorized", "gemv threads=true"] 62.394 ms (5%) 5.09 MiB (1%) 20864
["Laplace", "assemble threads=false"] 3.469 s (5%) 20.241 ms 1.44 GiB (1%) 38992
["Laplace", "assemble threads=true"] 534.971 ms (5%) 1.46 GiB (1%) 51080

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Compressors"]
  • ["Laplace permuted"]
  • ["Laplace vectorized"]
  • ["Laplace"]

Julia versioninfo

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz: 
                 speed         user         nice          sys         idle          irq
       #1-40  2200 MHz    1514458 s       9298 s     306908 s  1699126928 s          0 s
  Memory: 31.01314926147461 GB (24702.25390625 MB free)
  Uptime: 4.25307356e6 sec
  Load Avg:  1.79  1.49  0.91
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 8 default, 0 interactive, 4 GC (on 40 virtual cores)

Runtime information

Runtime Info
BLAS #threads 20
BLAS.vendor() lbt
Sys.CPU_THREADS 40

lscpu output:

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             40
On-line CPU(s) list:                0-39
Thread(s) per core:                 2
Core(s) per socket:                 10
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:                           4
CPU MHz:                            2200.000
CPU max MHz:                        3000,0000
CPU min MHz:                        800,0000
BogoMIPS:                           4400.00
Virtualization:                     VT-x
L1d cache:                          640 KiB
L1i cache:                          640 KiB
L2 cache:                           20 MiB
L3 cache:                           27,5 MiB
NUMA node0 CPU(s):                  0-9,20-29
NUMA node1 CPU(s):                  10-19,30-39
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities
Cpu Property Value
Brand Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 10 physical cores, 20 logical cores (on executing CPU)
Hyperthreading hardware capability detected
Clock Frequencies 2200 / 3000 MHz (base/max), 100 MHz bus
Data Cache Level 1:3 : (32, 1024, 14080) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC runs at constant rate (invariant from clock frequency)
Perf. Monitoring Performance Monitoring Counters (PMC) revision 4
Available hardware counters per logical core:
3 fixed-function counters of 48 bit width
4 general-purpose counters of 48 bit width
Hypervisor No

github-actions[bot] avatar Apr 30 '24 17:04 github-actions[bot]