dalli icon indicating copy to clipboard operation
dalli copied to clipboard

Performance loss when upgrading from Dalli 2.7.10 to 3.2.2

Open matthewzinke opened this issue 3 years ago • 5 comments

We attempted to upgrade our Dalli gem from 2.7.10 to 3.2.2 recently, and noticed a performance loss associated with the upgrade. Out memcached latency went up by just under 10% and our sidekiq processes started spending 2-3x as much time on memcached. We suspect this has something to do with the change from send_multiget to pipelined_get.

We are wondering if anyone has also seen performance loss associated with the upgrade from 2 to 3, and/or if there are any configuration recommendations to preserve performance characteristics.

matthewzinke avatar Sep 08 '22 21:09 matthewzinke

send_multiget looks like it was removed in https://github.com/petergoldstein/dalli/pull/852/files?diff=unified&w=1#diff-a326ee7acf56937d92cdeeb4b920547a89162170aab2ad32391200ad1b4c3638L457

was this removed just to simplify the code to always use PipelinedGetter @petergoldstein ?

jasonpenny avatar Sep 19 '22 15:09 jasonpenny

@jasonpenny The send_multiget method was simply renamed to pipelined_get in that commit to get some naming consistency. The latter method is still there, and is now invoked by the PipelinedGetter.

It is certainly possible that there's a performance loss here (possibly in the ResponseBuffer), but I'd need to get some more detail to start debugging.

petergoldstein avatar Sep 20 '22 01:09 petergoldstein

ah, I see.

I see a slowdown with a benchmark using ActiveSupport::Cache::MemCacheStore:

root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
                                            user     system      total        real
2.7.10 read_multi                       8.122310   2.830021  10.952331 ( 12.855987)
root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
                                            user     system      total        real
2.7.10 read_multi                       8.037037   2.876596  10.913633 ( 12.790800)

root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
                                            user     system      total        real
3.2.2 read_multi                        9.198624   2.890378  12.089002 ( 13.673892)
root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
                                            user     system      total        real
3.2.2 read_multi                        9.372887   2.795796  12.168683 ( 13.765607)
benchmark setup in a docker container

docker run -it --rm "ruby:2.7.3" bash

apt-get update -qq
apt-get install -y sudo
useradd test

git clone --depth 1 --branch main https://github.com/petergoldstein/dalli dalli

MEMCACHED_VERSION=1.6.10 bash dalli/scripts/install_memcached.sh

echo "
# frozen_string_literal: true
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'activesupport', '6.1.6.1'
  gem 'dalli', ENV['DALLI_VERSION']
end

require 'benchmark'
require 'active_support/all'

Benchmark.bm(37) do |x|
  x.report(%Q{#{ENV['DALLI_VERSION']} read_multi}) do
    c = ActiveSupport::Cache::MemCacheStore.new(['127.0.0.1:23417'])

    keys = ['One', 'Two' * 8, 'Three' * 40]
    keys.each { c.write(_1, rand.to_s * _1.length) }

    100_000.times do
      c.read_multi(*keys)
    end
  end
end
" > bench.rb

memcached -p 23417 -u test &

DALLI_VERSION=2.7.10 ruby bench.rb
DALLI_VERSION=3.2.2 ruby bench.rb

jasonpenny avatar Sep 20 '22 15:09 jasonpenny

I am no expert on profiling ruby, but I tried a few tools and did not find any obvious hotspots.

Something I did notice was that the callgraph of dalli v3 is larger, and I wonder if calling extra methods/more objects would be the cause of slowness here since we're calling it so often.

capture callgraph

# frozen_string_literal: true

require 'fileutils'
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'activesupport', '6.1.6.1'
  gem 'stackprof'
  gem 'dalli', path: ENV['DALLI_PATH']
  gem 'byebug'
  gem 'ruby-prof'
end

puts "Dalli version: #{Dalli::VERSION}"

dc = Dalli::Client.new('127.0.0.1:11211')
dc.flush_all
keys = ['One', 'Two' * 8, 'Three' * 40]
keys.each { dc.set(_1, rand.to_s * _1.length) }
dc = nil

RubyProf.start
c = Dalli::Client.new('127.0.0.1:11211', socket_timeout: 5)
values = nil
10.times do
  values = c.get_multi(*keys)
end
results = RubyProf.stop
puts values

output_dir = "./#{ENV['DALLI_PATH']}_calltree" 
FileUtils.mkdir_p(output_dir)
RubyProf::CallTreePrinter.new(results).print({ path: output_dir })

Dalli 2

dalli2

Dalli 3

dalli3

callgrind profiles: https://gist.github.com/jasonpenny/fed81267c8e0c987e63839a135f9db4a

jasonpenny avatar Dec 08 '22 15:12 jasonpenny

We also saw performance degradation. We had to disable compress to get performance matching 2.7.x while using :mem_cache_store in ruby 3.1.3, rails 6.1.4, and dalli 3.2.x.

It would be nice to re-enable compress and still be able to upgrade to 3.2.x in the future. Have not been able to root cause exactly why, but hopefully, this gives a little clue towards a fix.

taiyyu avatar Feb 28 '23 01:02 taiyyu