Performance loss when upgrading from Dalli 2.7.10 to 3.2.2
We attempted to upgrade our Dalli gem from 2.7.10 to 3.2.2 recently, and noticed a performance loss associated with the upgrade. Out memcached latency went up by just under 10% and our sidekiq processes started spending 2-3x as much time on memcached. We suspect this has something to do with the change from send_multiget to pipelined_get.
We are wondering if anyone has also seen performance loss associated with the upgrade from 2 to 3, and/or if there are any configuration recommendations to preserve performance characteristics.
send_multiget looks like it was removed in https://github.com/petergoldstein/dalli/pull/852/files?diff=unified&w=1#diff-a326ee7acf56937d92cdeeb4b920547a89162170aab2ad32391200ad1b4c3638L457
was this removed just to simplify the code to always use PipelinedGetter @petergoldstein ?
@jasonpenny The send_multiget method was simply renamed to pipelined_get in that commit to get some naming consistency. The latter method is still there, and is now invoked by the PipelinedGetter.
It is certainly possible that there's a performance loss here (possibly in the ResponseBuffer), but I'd need to get some more detail to start debugging.
ah, I see.
I see a slowdown with a benchmark using ActiveSupport::Cache::MemCacheStore:
root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
user system total real
2.7.10 read_multi 8.122310 2.830021 10.952331 ( 12.855987)
root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
user system total real
2.7.10 read_multi 8.037037 2.876596 10.913633 ( 12.790800)
root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
user system total real
3.2.2 read_multi 9.198624 2.890378 12.089002 ( 13.673892)
root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
user system total real
3.2.2 read_multi 9.372887 2.795796 12.168683 ( 13.765607)
benchmark setup in a docker container
docker run -it --rm "ruby:2.7.3" bash
apt-get update -qq
apt-get install -y sudo
useradd test
git clone --depth 1 --branch main https://github.com/petergoldstein/dalli dalli
MEMCACHED_VERSION=1.6.10 bash dalli/scripts/install_memcached.sh
echo "
# frozen_string_literal: true
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'activesupport', '6.1.6.1'
gem 'dalli', ENV['DALLI_VERSION']
end
require 'benchmark'
require 'active_support/all'
Benchmark.bm(37) do |x|
x.report(%Q{#{ENV['DALLI_VERSION']} read_multi}) do
c = ActiveSupport::Cache::MemCacheStore.new(['127.0.0.1:23417'])
keys = ['One', 'Two' * 8, 'Three' * 40]
keys.each { c.write(_1, rand.to_s * _1.length) }
100_000.times do
c.read_multi(*keys)
end
end
end
" > bench.rb
memcached -p 23417 -u test &
DALLI_VERSION=2.7.10 ruby bench.rb
DALLI_VERSION=3.2.2 ruby bench.rb
I am no expert on profiling ruby, but I tried a few tools and did not find any obvious hotspots.
Something I did notice was that the callgraph of dalli v3 is larger, and I wonder if calling extra methods/more objects would be the cause of slowness here since we're calling it so often.
capture callgraph
# frozen_string_literal: true
require 'fileutils'
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'activesupport', '6.1.6.1'
gem 'stackprof'
gem 'dalli', path: ENV['DALLI_PATH']
gem 'byebug'
gem 'ruby-prof'
end
puts "Dalli version: #{Dalli::VERSION}"
dc = Dalli::Client.new('127.0.0.1:11211')
dc.flush_all
keys = ['One', 'Two' * 8, 'Three' * 40]
keys.each { dc.set(_1, rand.to_s * _1.length) }
dc = nil
RubyProf.start
c = Dalli::Client.new('127.0.0.1:11211', socket_timeout: 5)
values = nil
10.times do
values = c.get_multi(*keys)
end
results = RubyProf.stop
puts values
output_dir = "./#{ENV['DALLI_PATH']}_calltree"
FileUtils.mkdir_p(output_dir)
RubyProf::CallTreePrinter.new(results).print({ path: output_dir })
Dalli 2
Dalli 3
callgrind profiles: https://gist.github.com/jasonpenny/fed81267c8e0c987e63839a135f9db4a
We also saw performance degradation. We had to disable compress to get performance matching 2.7.x while using :mem_cache_store in ruby 3.1.3, rails 6.1.4, and dalli 3.2.x.
It would be nice to re-enable compress and still be able to upgrade to 3.2.x in the future. Have not been able to root cause exactly why, but hopefully, this gives a little clue towards a fix.