zig icon indicating copy to clipboard operation
zig copied to clipboard

codegen/llvm: improve loadTruncate() to use a simple load when possible

Open xxxbxxx opened this issue 2 years ago • 1 comments

forcing llvm to truncate the padding bits on load prevents some optimizations.

fixes 370662c565ba541157e3c69d12625c28787d642e closes https://github.com/ziglang/zig/issues/17768

xxxbxxx avatar Nov 26 '23 21:11 xxxbxxx

The idea of this change is to assume that if there is a local variable in it's own alloca, then the padding bits are clean. Now,

  • I don't know if it's true (I expect it is possible to write garbage to the bits with some intToPtr / ptrCast. but maybe it can happen when coping from an other variable? )
  • I don't know if it the proper way to implement this...

however, it does get the job done:

Benchmark 1 (81 runs): ./benchmark_baseline 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           249ms ± 13.0ms     225ms …  298ms          4 ( 5%)        0%
  peak_rss            880KB ± 7.17KB     819KB …  881KB          2 ( 2%)        0%
  cpu_cycles          836M  ± 6.80M      828M  …  859M           2 ( 2%)        0%
  instructions       2.06G  ± 67.9      2.06G  … 2.06G          10 (12%)        0%
  cache_references    759K  ±  142K      521K  … 1.21M           1 ( 1%)        0%
  cache_misses       17.2K  ± 9.76K     1.56K  … 53.8K           1 ( 1%)        0%
  branch_misses      5.77M  ± 16.6K     5.73M  … 5.80M           0 ( 0%)        0%
Benchmark 2 (70 runs): ./benchmark_master 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           286ms ± 7.91ms     273ms …  306ms          0 ( 0%)        💩+ 15.0% ±  1.4%
  peak_rss            909KB ±    0       909KB …  909KB          0 ( 0%)        💩+  3.4% ±  0.2%
  cpu_cycles         1.00G  ± 8.34M      987M  … 1.02G           0 ( 0%)        💩+ 19.9% ±  0.3%
  instructions       2.39G  ± 69.2      2.39G  … 2.39G           5 ( 7%)        💩+ 16.0% ±  0.0%
  cache_references    943K  ±  697K      539K  … 6.51M           3 ( 4%)        💩+ 24.3% ± 20.5%
  cache_misses       28.9K  ± 11.8K     6.86K  … 65.4K           1 ( 1%)        💩+ 68.4% ± 20.0%
  branch_misses      5.89M  ± 13.8K     5.87M  … 5.92M           0 ( 0%)        💩+  2.1% ±  0.1%
Benchmark 3 (79 runs): ./benchmark_patched 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           253ms ± 8.16ms     237ms …  285ms          6 ( 8%)          +  1.7% ±  1.4%
  peak_rss            909KB ±    0       909KB …  909KB          0 ( 0%)        💩+  3.4% ±  0.2%
  cpu_cycles          888M  ± 8.69M      874M  …  918M           1 ( 1%)        💩+  6.2% ±  0.3%
  instructions       2.04G  ± 59.2      2.04G  … 2.04G          14 (18%)        ⚡-  1.1% ±  0.0%
  cache_references    978K  ± 1.56M      469K  … 14.5M          10 (13%)          + 29.0% ± 44.9%
  cache_misses       25.8K  ± 16.7K     1.10K  … 70.9K           6 ( 8%)        💩+ 50.0% ± 24.6%
  branch_misses      5.80M  ± 16.7K     5.76M  … 5.84M          13 (16%)          +  0.5% ±  0.1%

xxxbxxx avatar Nov 26 '23 21:11 xxxbxxx

I don't know if it the proper way to implement this...

given this, combined with the fact that it's failing CI checks, needs a rebase, I'm going to close it. I don't know if it's the proper way to implement it either, unless I dive in and implement the whole thing myself, in which case I might as well close this and start over. sorry for your wasted efforts. feel free to reopen to start the discussion anew.

andrewrk avatar Aug 24 '24 05:08 andrewrk

yeah I'm guessing it should be done by some generic "provenance" framework, maybe in relation to alias analysis / result location semantics..

for the record, to leave some breadcrumb for future implementer:

  • I've kept the commit up to date and active my local branch and the compiler works fine with it, never stumbled on a miscompilation since it's last year.
  • I've updated the branch xxxbxxx:load-truncate-optim with commit rebased on the current master branch (but I don't know if it's possible to update the closed pullrequest)

xxxbxxx avatar Aug 24 '24 11:08 xxxbxxx