ruby icon indicating copy to clipboard operation
ruby copied to clipboard

Improve `String#casecmp?` and `Symbol#casecmp?` performance

Open koic opened this issue 5 years ago • 4 comments

Background

String#casecmp? method was suggested to RuboCop Performance to use as a fast method. https://github.com/rubocop-hq/rubocop-performance/issues/100

However String#casecmp? is always slower than String#casecmp("arg").zero?.

Summary

This PR improves String#casecmp? and Symbol#casecmp? performance when comparing ASCII characters. OTOH, in the case of Non-ASCII characters, performance is slightly degraded by added condition.

And I found #1668 which is essentially the same.

Benchmark

The following is a benchmark case for String#casecmp? with similar cases.

require 'benchmark/ips'

puts '* ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'String'.upcase == 'string' }
  x.report('downcase')      { 'String'.downcase == 'string' }
  x.report('casecmp.zero?') { 'String'.casecmp('string').zero? }
  x.report('casecmp?')      { 'String'.casecmp?('string') }
  x.compare!
end

puts '* Non-ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'äö '.upcase == ('ÄÖÜ') }
  x.report('downcase')      { 'äö '.downcase == ('ÄÖÜ') }
  x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? }
  x.report('casecmp?')      { 'äö '.casecmp?('ÄÖÜ') }
  x.compare!
end

Before

* ASCII
Warming up --------------------------------------
              upcase   200.428k i/100ms
            downcase   197.503k i/100ms
       casecmp.zero?   216.244k i/100ms
            casecmp?   171.257k i/100ms
Calculating -------------------------------------
              upcase      4.326M (± 2.9%) i/s -     21.646M in       5.008511s
            downcase      4.350M (± 2.3%) i/s -     21.923M in       5.042694s
       casecmp.zero?      4.998M (± 3.7%) i/s -     25.084M in       5.025253s
            casecmp?      3.090M (± 2.9%) i/s -     15.584M in       5.047497s

Comparison:
       casecmp.zero?:  4998357.4 i/s
            downcase:  4349766.0 i/s - 1.15x  slower
              upcase:  4325765.3 i/s - 1.16x  slower
            casecmp?:  3090194.2 i/s - 1.62x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   137.352k i/100ms
            downcase   136.735k i/100ms
       casecmp.zero?   206.341k i/100ms
            casecmp?    96.326k i/100ms
Calculating -------------------------------------
              upcase      2.215M (± 2.7%) i/s -     11.126M in       5.027582s
            downcase      2.199M (± 2.5%) i/s -     11.076M in       5.038781s
       casecmp.zero?      4.709M (± 1.8%) i/s -     23.729M in       5.041362s
            casecmp?      1.315M (± 1.6%) i/s -      6.646M in       5.054479s

Comparison:
       casecmp.zero?:  4708502.7 i/s
              upcase:  2214590.5 i/s - 2.13x  slower
            downcase:  2199478.9 i/s - 2.14x  slower
            casecmp?:  1315326.8 i/s - 3.58x  slower

After

casecmp? performance for ASCII characters will be improved.

* ASCII
Warming up --------------------------------------
              upcase   206.368k i/100ms
            downcase   205.971k i/100ms
       casecmp.zero?   225.636k i/100ms
            casecmp?   232.030k i/100ms
Calculating -------------------------------------
              upcase      4.337M (± 1.3%) i/s -     21.875M in   5.045187s
            downcase      4.297M (± 1.5%) i/s -     21.627M in   5.033750s
       casecmp.zero?      5.240M (± 1.4%) i/s -     26.399M in   5.038613s
            casecmp?      5.548M (± 1.5%) i/s -     27.844M in   5.019895s

Comparison:
            casecmp?:  5547954.3 i/s
       casecmp.zero?:  5240466.7 i/s - 1.06x  slower
              upcase:  4336522.3 i/s - 1.28x  slower
            downcase:  4297370.8 i/s - 1.29x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   141.841k i/100ms
            downcase   141.233k i/100ms
       casecmp.zero?   215.401k i/100ms
            casecmp?    94.625k i/100ms
Calculating -------------------------------------
              upcase      2.238M (± 2.1%) i/s -     11.205M in   5.010247s
            downcase      2.243M (± 1.3%) i/s -     11.299M in   5.038967s
       casecmp.zero?      4.611M (± 2.6%) i/s -     23.048M in   5.001970s
            casecmp?      1.255M (± 2.4%) i/s -      6.340M in   5.055611s

Comparison:
       casecmp.zero?:  4610838.3 i/s
            downcase:  2242621.9 i/s - 2.06x  slower
              upcase:  2237553.1 i/s - 2.06x  slower
            casecmp?:  1254739.1 i/s - 3.67x  slower

Symbol#casecmp? have similar results.

Other Information

This is an off-topic for this PR. I inform RuboCop Performance that String#casecmp and String#casecmp? behave differently when using Non-ASCII characters.

'äöü'.casecmp('ÄÖÜ').zero? #=> false
'äöü'.casecmp?('ÄÖÜ')      #=> true

koic avatar Feb 28 '20 15:02 koic

You can add a benchmark script under benchmark/, like 5e897227ff3d37a36be96bb2c082370d437058ea, instead of writing in a commit log, and -omarkdown options makes a markdown table.

Iteration per second (i/s)

compare-ruby built-ruby
casecmp_p-1 2.245M 11.624M
casecmp_p-10 1.436M 1.955M
casecmp_p-100 283.525k 214.075k
casecmp_p-1000 30.240k 21.550k
casecmp_p-nonascii1 669.295k 656.785k
casecmp_p-nonascii10 112.033k 111.158k
casecmp_p-nonascii100 12.075k 11.985k
casecmp_p-nonascii1000 1.201k 1.192k

nobu avatar Feb 29 '20 06:02 nobu

Related bug tracker issue https://bugs.ruby-lang.org/issues/13750

the-spectator avatar Jun 07 '20 17:06 the-spectator

Rebased this PR, and run with adding a benchmark as the following,

diff --git i/benchmark/string_casecmp_p.yml w/benchmark/string_casecmp_p.yml
index a790ce7d55..8bb581aed7 100644
--- i/benchmark/string_casecmp_p.yml
+++ w/benchmark/string_casecmp_p.yml
@@ -1,2 +1,4 @@
 prelude: |
+  cstr = 'String'
+  lstr = 'string'
   lstr1 = [*"a".."z",*"0".."9"].join("")
@@ -18,2 +20,3 @@ prelude: |
 benchmark:
+  casecmp_p-ascii: cstr.casecmp?(lstr)
   casecmp_p-1: lstr1.casecmp?(ustr1)

but can't see significant differences now...

Iteration per second (i/s)

compare-ruby built-ruby
casecmp_p-ascii 7.607M 7.601M
1.00x -
casecmp_p-1 2.238M 2.281M
- 1.02x
casecmp_p-10 1.399M 1.414M
- 1.01x
casecmp_p-100 271.704k 274.164k
- 1.01x
casecmp_p-1000 29.003k 30.158k
- 1.04x
casecmp_p-nonascii1 655.054k 672.244k
- 1.03x
casecmp_p-nonascii10 108.929k 111.451k
- 1.02x
casecmp_p-nonascii100 11.656k 11.808k
- 1.01x
casecmp_p-nonascii1000 1.163k 1.180k
- 1.01x

nobu avatar Jun 08 '20 01:06 nobu

It's still would be useful for me. Pretty common case, especially when File::FNM_CASEFOLD is broken for Dir.glob in Ruby 3.1 on Linux.

AlexWayfer avatar Feb 23 '22 18:02 AlexWayfer