fix: Safe-navigate to session flusher
Sentry.session_flusher is nil-able. And under some circumstances (which I still can't figure out) in specs it's nil causing our spec suite randomly (but as of recent, pretty all the time) fail.
Example failure
Failure/Error: post(path, env:, headers:, xhr: true)
NoMethodError:
undefined method `add_session' for nil
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:251:in `end_session'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:260:in `with_session_tracking'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry-ruby.rb:411:in `with_session_tracking'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/rack/capture_exceptions.rb:21:in `block in call'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:59:in `with_scope'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry-ruby.rb:391:in `with_scope'
# /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/rack/capture_exceptions.rb:20:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:45:in `call_app'
# /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:26:in `block in call'
# /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/base.rb:190:in `block in tagged'
# /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/semantic_logger.rb:395:in `named_tagged'
# /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/base.rb:197:in `tagged'
# /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:26:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/rack-3.1.7/lib/rack/runtime.rb:24:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/rack-3.1.7/lib/rack/sendfile.rb:114:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/rack-cors-2.0.2/lib/rack/cors.rb:102:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/railties-7.2.1/lib/rails/engine.rb:535:in `call'
# /home/ixti/.gem/ruby/3.3.4/gems/rack-test-2.1.0/lib/rack/test.rb:360:in `process_request'
# /home/ixti/.gem/ruby/3.3.4/gems/rack-test-2.1.0/lib/rack/test.rb:153:in `request'
# ./spec/support/helpers/protobufable_test_helper.rb:76:in `rpc'
# ./spec/requests/asgard/auth_controller_spec.rb:220:in `block (4 levels) in <top (required)>'
# /home/ixti/.gem/ruby/3.3.4/gems/webmock-3.23.1/lib/webmock/rspec.rb:39:in `block (2 levels) in <top (required)>'
RSpec Config (Sentry related)
require "sentry/test_helper"
RSpec.configure do |config|
config.include Sentry::TestHelper
config.before { setup_sentry_test }
config.after { teardown_sentry_test }
end
Workaround
As a workaround, we just disabled session tracking:
Sentry.init do |config|
# ...
# See: https://github.com/getsentry/sentry-ruby/pull/2337
config.excluded_exceptions << "Sidekiq::JobRetry::Handled"
# See: https://github.com/getsentry/sentry-ruby/pull/2396
config.auto_session_tracking = false
# ...
end
Resolves: https://github.com/getsentry/sentry-ruby/issues/2378
We tried and it still fails:
require "sentry/test_helper"
RSpec.configure do |config|
config.include Sentry::TestHelper
config.before do
setup_sentry_test { _1.auto_session_tracking = true }
end
config.after { teardown_sentry_test }
end
Thanks for reporting the issue. It does look very puzzling as the only method that could nullify the session flusher is Sentry.close. Can you do a quick search to see if it's called anywhere in the test suite or codebase?
I am experiencing the same issue on production and hence reported #2378. Currently, the issue appears rather frequent for us, but not for each request:
Is there any chance I could help to get this merged and released? Without a fix, each occurrence ends up with a 500 Internal Server Error and hides actually errors. I would rather miss a session tracing than having those errors.
In our app, we never call Sentry.close ourselves.
Thanks for reporting the issue. It does look very puzzling as the only method that could nullify the session flusher is
Sentry.close. Can you do a quick search to see if it's called anywhere in the test suite or codebase?
We don't call Sentry.close anywhere.
Could you grab sentry from solnic/make-closing-thread-safe branch and see if the crash is gone?
Could you grab sentry from solnic/make-closing-thread-safe branch and see if the crash is gone?
I tried the branch, but since we modified the codebase (where this issue normally occurred) a lot recently, I wasn't able to confirm the fix is working. Sorry for not being able to help out here. Maybe it's still valid to merge this PR using the safe navigation operator?
We also took a workaround way by disabling session tracking completely. And even if we go back, the bug is pretty sporadic, to being able to tell if the patch in solnic/make-closing-thread-safe fixes the bug or no.
Thanks folks. Would it be possible for you to disable your workarounds after we release a new version and see if it helped. The proposed fix here comes with a risk that it will silence actual failures where the session_flusher should really be there. It's kinda hard to tell which option would be worse but given that the fix addresses an edge case by making a change that may result in breaking more common cases (an actual misconfiguration of Sentry), then it's probably better not to apply the fix.
As written above, I already tested the branch and partially reverted some of the code changes I considered to have the most impact on this issue.
While I understand your concern, this issue (and especially the one discussed in #2428) may hide actually bugs from our code base as Sentry customers -- that happened at least for us. We would normally trust in the Sentry error reporting, but when this bug occurs, we lack any notification and insights on actual defects of our software. Hence, from our perspective, it is far less critical to miss some sessions than to miss the actual bug that occurred.
As written above, I already tested the branch and partially reverted some of the code changes I considered to have the most impact on this issue.
Gotcha, thank you!
While I understand your concern, this issue (and especially the one discussed in #2428) may hide actually bugs from our code base as Sentry customers -- that happened at least for us. We would normally trust in the Sentry error reporting, but when this bug occurs, we lack any notification and insights on actual defects of our software. Hence, from our perspective, it is far less critical to miss some sessions than to miss the actual bug that occurred.
Alright, given that Hub is going away eventually (see https://github.com/getsentry/sentry-ruby/issues/1495#issuecomment-1881400829) we can just rely on the workarounds here and in your PR for the time being. Thanks for all your input and help with this btw!
Would it be possible for you to disable your workarounds after we release a new version and see if it helped.
Sure, we will give it a try!
hey guys. So we've been running into NoMethodError add_session for nil for quite some time now, but the errors were intermittent.
Recently we were able to reproduce it with one seed and 3 specs. I haven't tried to reproduce it in a blank app/script yet, but the general problem I'm seeing is that if we're running setup_sentry_test before any request spec is ran then for every request spec that follows we'll get this error. If any request spec runs before the spec that does setup_sentry_test then we don't get the error at all. Which explains why it was intermittent for us. But it also tells me that the issue is not with threads, I think(cc @solnic), since it will always happen if the specs are ran in the order I mentioned above.
I have no idea if there's any connection between when sentry is initialized and request specs, or if the connection is unique to our project.
As a workaround, this:
before do
setup_sentry_test { |it| it.auto_session_tracking = false }
end
works, but meh.
I should also mention that the issue persists on master branch.
I will try to create a simple reproduction and/or a fix, but just wanted to provide some info in case it can help anyone 👍
Let me know if I can be of any further assistance here
edit: another bit of info. It seems that if we remove config.enabled_environments = %w[production] from sentry.rb then all is well somehow 🤔
edit2: 2 specs are enough too
@vegab oh these are great insights, thank you! I'll look into this again
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 98.17%. Comparing base (
ceba0da) to head (7bf8c9e). Report is 12 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #2396 +/- ##
==========================================
+ Coverage 98.13% 98.17% +0.04%
==========================================
Files 126 128 +2
Lines 4761 4829 +68
==========================================
+ Hits 4672 4741 +69
+ Misses 89 88 -1
| Components | Coverage Δ | |
|---|---|---|
| sentry-ruby | 98.57% <100.00%> (+0.05%) |
:arrow_up: |
| sentry-rails | 97.08% <ø> (+0.02%) |
:arrow_up: |
| sentry-sidekiq | 96.96% <ø> (ø) |
|
| sentry-resque | 92.85% <ø> (ø) |
|
| sentry-delayed_job | 95.65% <ø> (ø) |
|
| sentry-opentelemetry | 99.31% <ø> (ø) |
| Files with missing lines | Coverage Δ | |
|---|---|---|
| sentry-ruby/lib/sentry/hub.rb | 98.87% <100.00%> (ø) |