Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent starting Vernier in nested transactions #2528

Merged

Conversation

solnic
Copy link
Collaborator

@solnic solnic commented Jan 28, 2025

Fixes #2525

I suspect this will also help with #2457

@solnic solnic linked an issue Jan 28, 2025 that may be closed by this pull request
@solnic solnic marked this pull request as draft January 28, 2025 14:02
@solnic solnic force-pushed the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch from f4e835c to 2b53dd7 Compare January 28, 2025 14:03

unless @hub.profiler_running?
@profiler = @configuration.profiler_class.new(@configuration)
end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing right here is the heart of the fix, if I remove this conditional the spec I added easily reproduces the crash from the issue:

Failures:

  1) Sentry transactions / profiler starts profiler just once inside nested transactions
     Failure/Error: @stack_table_hash ||= profile._stack_table.to_h
     
     NoMethodError:
       undefined method `_stack_table' for nil
     # ./lib/sentry/vernier/output.rb:85:in `stack_table_hash'
     # ./lib/sentry/vernier/output.rb:57:in `frames'
     # ./lib/sentry/vernier/output.rb:22:in `to_h'
     # ./lib/sentry/vernier/profiler.rb:100:in `to_hash'
     # ./lib/sentry/transaction_event.rb:65:in `populate_profile'
     # ./lib/sentry/transaction_event.rb:40:in `initialize'
     # ./lib/sentry/client.rb:174:in `new'
     # ./lib/sentry/client.rb:174:in `event_from_transaction'
     # ./lib/sentry/transaction.rb:267:in `finish'
     # ./spec/sentry/transactions/profiler_spec.rb:35:in `block (3 levels) in <top (required)>'
     # ./spec/sentry/transactions/profiler_spec.rb:18:in `block (2 levels) in <top (required)>'
     # /usr/local/rvm/gems/default/gems/rspec-retry-0.6.2/lib/rspec/retry.rb:124:in `block in run'
     # /usr/local/rvm/gems/default/gems/rspec-retry-0.6.2/lib/rspec/retry.rb:110:in `run'
     # /usr/local/rvm/gems/default/gems/rspec-retry-0.6.2/lib/rspec_ext/rspec_ext.rb:12:in `run_with_retry'
     # /usr/local/rvm/gems/default/gems/rspec-retry-0.6.2/lib/rspec/retry.rb:37:in `block (2 levels) in setup'

Finished in 0.02769 seconds (files took 0.20007 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/sentry/transactions/profiler_spec.rb:17 # Sentry transactions / profiler starts profiler just once inside nested transactions

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Project coverage is 64.11%. Comparing base (ca7d386) to head (58cfa27).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sentry-ruby/lib/sentry/vernier/profiler.rb 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2528      +/-   ##
==========================================
- Coverage   64.45%   64.11%   -0.34%     
==========================================
  Files         123      123              
  Lines        4707     4724      +17     
==========================================
- Hits         3034     3029       -5     
- Misses       1673     1695      +22     
Components Coverage Δ
sentry-ruby 59.84% <90.90%> (-0.43%) ⬇️
sentry-rails 59.22% <ø> (ø)
sentry-sidekiq 97.16% <ø> (ø)
sentry-resque 92.85% <ø> (ø)
sentry-delayed_job 95.65% <ø> (ø)
sentry-opentelemetry 99.31% <ø> (ø)
Files with missing lines Coverage Δ
sentry-ruby/lib/sentry/hub.rb 51.30% <100.00%> (+3.55%) ⬆️
sentry-ruby/lib/sentry/transaction.rb 72.32% <100.00%> (+0.35%) ⬆️
sentry-ruby/lib/sentry/transaction_event.rb 79.48% <100.00%> (+0.53%) ⬆️
sentry-ruby/lib/sentry/vernier/profiler.rb 26.76% <0.00%> (-0.39%) ⬇️

... and 4 files with indirect coverage changes

@@ -257,7 +261,7 @@ def finish(hub: nil, end_timestamp: nil)
@name = UNLABELD_NAME
end

@profiler.stop
@hub.stop_profiler!(self)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the hub is responsible for starting a profiler, making this change actually felt like an improvement. Transactions are coupled to hub anyway.

@solnic solnic marked this pull request as ready for review January 28, 2025 14:38
@solnic solnic requested review from sl0thentr0py and st0012 January 28, 2025 14:38
@st0012
Copy link
Collaborator

st0012 commented Jan 28, 2025

I think more of profiler related logic can actually be moved out of Transaction into Hub. I'll suggest some changes later.

@st0012
Copy link
Collaborator

st0012 commented Jan 28, 2025

Something like this?

diff --git a/sentry-ruby/lib/sentry/hub.rb b/sentry-ruby/lib/sentry/hub.rb
index 788e8902..a1343831 100644
--- a/sentry-ruby/lib/sentry/hub.rb
+++ b/sentry-ruby/lib/sentry/hub.rb
@@ -23,8 +23,17 @@ module Sentry
 
     def start_profiler!(transaction)
       MUTEX.synchronize do
-        transaction.start_profiler!
-        @current_profiler[transaction.__id__] = transaction.profiler
+        return if !@current_profiler.empty?
+        profiler = configuration.profiler_class.new(configuration)
+        profiler.set_initial_sample_decision(transaction.sampled)
+        profiler.start
+        @current_profiler[transaction.__id__] = profiler
+      end
+    end
+
+    def profiler_for(transaction)
+      MUTEX.synchronize do
+        @current_profiler[transaction.__id__]
       end
     end
 
@@ -34,12 +43,6 @@ module Sentry
       end
     end
 
-    def profiler_running?
-      MUTEX.synchronize do
-        !@current_profiler.empty?
-      end
-    end
-
     def new_from_top
       Hub.new(current_client, current_scope)
     end
diff --git a/sentry-ruby/lib/sentry/transaction.rb b/sentry-ruby/lib/sentry/transaction.rb
index 6fc2524b..8fc077e5 100644
--- a/sentry-ruby/lib/sentry/transaction.rb
+++ b/sentry-ruby/lib/sentry/transaction.rb
@@ -56,10 +56,6 @@ module Sentry
     # @return [Hash]
     attr_reader :contexts
 
-    # The Profiler instance for this transaction.
-    # @return [Profiler]
-    attr_reader :profiler
-
     def initialize(
       hub:,
       name: nil,
@@ -86,10 +82,6 @@ module Sentry
       @contexts = {}
       @measurements = {}
 
-      unless @hub.profiler_running?
-        @profiler = @configuration.profiler_class.new(@configuration)
-      end
-
       init_span_recorder
     end
 
@@ -300,13 +292,16 @@ module Sentry
       @contexts[key] = value
     end
 
+    # The Profiler instance for this transaction.
+    # @return [Profiler]
+    def profiler
+      @hub.profiler_for(self)
+    end
+
     # Start the profiler.
     # @return [void]
     def start_profiler!
-      return unless profiler
-
-      profiler.set_initial_sample_decision(sampled)
-      profiler.start
+      @hub.start_profiler!(self)
     end
 
     # These are high cardinality and thus bad
diff --git a/sentry-ruby/spec/sentry/transactions/profiler_spec.rb b/sentry-ruby/spec/sentry/transactions/profiler_spec.rb
index 78bb385b..fb1efd7b 100644
--- a/sentry-ruby/spec/sentry/transactions/profiler_spec.rb
+++ b/sentry-ruby/spec/sentry/transactions/profiler_spec.rb
@@ -21,8 +21,6 @@ RSpec.describe Sentry, 'transactions / profiler', when: [:vernier_installed?, :r
 
       ProfilerTest::Bar.bar
 
-      expect(Sentry.get_current_hub.profiler_running?).to be(true)
-
       expect(parent_transaction.profiler).to_not be_nil
       expect(nested_transaction.profiler).to be_nil

@sl0thentr0py
Copy link
Member

I think more of profiler related logic can actually be moved out of Transaction into Hub. I'll suggest some changes later.

not for now since the profile is currently linked to the transaction.
Later we will have something called 'Continuous Profiling' so we can make changes then.

@solnic solnic force-pushed the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch from df9f4b3 to 4d00641 Compare January 29, 2025 11:51
@solnic
Copy link
Collaborator Author

solnic commented Jan 29, 2025

I think more of profiler related logic can actually be moved out of Transaction into Hub. I'll suggest some changes later.

I'd prefer to not make any more changes than the minimum to get the issue fixed, especially that hub doesn't expose a method for finishing a transaction, and all code that finishes transactions uses Transaction#finish, so it's the best place to stop profiling.

@solnic solnic force-pushed the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch from 4d00641 to b359bfd Compare January 29, 2025 12:27
@st0012
Copy link
Collaborator

st0012 commented Jan 29, 2025

not for now since the profile is currently linked to the transaction.

and all code that finishes transactions uses Transaction#finish, so it's the best place to stop profiling.

I agree with both but I don't think my patch would change them? We can also just pass down the profiler to the transaction through a keyword argument if we want to actually retain a reference to it there.

Also, under the current implementation there's a gap between Hub#profiler_running? (thus the creation of the profiler) and Hub#start_profiler! which means that we can still have 2 profilers running at the same time?

If we only create the profiler in Hub#start_profiler! like my patch suggests, we'd avoid that problem.

@sl0thentr0py
Copy link
Member

ok let's take a step back, is this

  • only about nested transactions
  • or more generally about not starting Vernier again when it's already running (globally)
    • if this is the problem, isn't it possible to ask Vernier somehow if it's currently recording? I would prefer that over adding more state into the hub to track this stuff

@solnic
Copy link
Collaborator Author

solnic commented Feb 3, 2025

  • if this is the problem, isn't it possible to ask Vernier somehow if it's currently recording? I would prefer that over adding more state into the hub to track this stuff

@sl0thentr0py so, I've been grinding my teeth when trying to make it work by checking Vernier. It stores an instance variable called @collector, there's no public API to access it, accessing it via instance_variable_get seems to be prone to race conditions. I tried with an approach more or less like this:

diff --git sentry-ruby/lib/sentry/hub.rb sentry-ruby/lib/sentry/hub.rb
index 788e8902..8f9ed9e0 100644
--- sentry-ruby/lib/sentry/hub.rb
+++ sentry-ruby/lib/sentry/hub.rb
@@ -18,26 +18,6 @@ module Sentry
       first_layer = Layer.new(client, scope)
       @stack = [first_layer]
       @last_event_id = nil
-      @current_profiler = {}
-    end
-
-    def start_profiler!(transaction)
-      MUTEX.synchronize do
-        transaction.start_profiler!
-        @current_profiler[transaction.__id__] = transaction.profiler
-      end
-    end
-
-    def stop_profiler!(transaction)
-      MUTEX.synchronize do
-        @current_profiler.delete(transaction.__id__)&.stop
-      end
-    end
-
-    def profiler_running?
-      MUTEX.synchronize do
-        !@current_profiler.empty?
-      end
     end
 
     def new_from_top
@@ -120,7 +100,7 @@ module Sentry
       sampling_context.merge!(custom_sampling_context)
       transaction.set_initial_sample_decision(sampling_context: sampling_context)
 
-      start_profiler!(transaction)
+      transaction.start_profiler!
 
       transaction
     end
diff --git sentry-ruby/lib/sentry/transaction.rb sentry-ruby/lib/sentry/transaction.rb
index 6fc2524b..591c116c 100644
--- sentry-ruby/lib/sentry/transaction.rb
+++ sentry-ruby/lib/sentry/transaction.rb
@@ -85,10 +85,7 @@ module Sentry
       @effective_sample_rate = nil
       @contexts = {}
       @measurements = {}
-
-      unless @hub.profiler_running?
-        @profiler = @configuration.profiler_class.new(@configuration)
-      end
+      @profiler = @configuration.profiler_class.new(@configuration)
 
       init_span_recorder
     end
@@ -261,7 +258,7 @@ module Sentry
         @name = UNLABELD_NAME
       end
 
-      @hub.stop_profiler!(self)
+      @profiler.stop
 
       if @sampled
         event = hub.current_client.event_from_transaction(self)
diff --git sentry-ruby/lib/sentry/vernier/profiler.rb sentry-ruby/lib/sentry/vernier/profiler.rb
index 2818ae8d..4a2c77ec 100644
--- sentry-ruby/lib/sentry/vernier/profiler.rb
+++ sentry-ruby/lib/sentry/vernier/profiler.rb
@@ -8,13 +8,14 @@ module Sentry
   module Vernier
     class Profiler
       EMPTY_RESULT = {}.freeze
+      MUTEX = Mutex.new
 
-      attr_reader :started, :event_id, :result
+      attr_reader :event_id, :result
 
       def initialize(configuration)
         @event_id = SecureRandom.uuid.delete("-")
 
-        @started = false
+        @vernier_collector = nil
         @sampled = nil
 
         @profiling_enabled = defined?(Vernier) && configuration.profiling_enabled?
@@ -51,38 +52,53 @@ module Sentry
         log("Discarding profile due to sampling decision") unless @sampled
       end
 
-      def start
-        return unless @sampled
-        return if @started
+      def other_vernier_running?
+        MUTEX.synchronize do
+          @vernier_collector && vernier_collector_id != @vernier_collector.__id__
+        end
+      end
 
-        @started = ::Vernier.start_profile
+      def this_vernier_running?
+        MUTEX.synchronize do
+          @vernier_collector && vernier_collector_id == @vernier_collector.__id__
+        end
+      end
 
-        log("Started")
+      def vernier_collector_id
+        Vernier.instance_variable_get(:@collector).__id__ if Vernier.instance_variable_get(:@collector)
+      end
 
-        @started
-      rescue RuntimeError => e
-        # TODO: once Vernier raises something more dedicated, we should catch that instead
-        if e.message.include?("Profile already started")
-          log("Not started since running elsewhere")
-        else
-          log("Failed to start: #{e.message}")
-        end
+      def start
+        return if !@sampled || other_vernier_running?
+
+        @vernier_collector = ::Vernier.start_profile
+
+        log("Started with Vernier collector id: #{vernier_collector_id}")
+        # rescue RuntimeError => e
+        #   # TODO: once Vernier raises something more dedicated, we should catch that instead
+        #   if e.message.include?("Profile already started")
+        #     log("Not started since running elsewhere")
+        #   else
+        #     log("Failed to start: #{e.message}")
+        #   end
+      rescue
+        debugger
       end
 
       def stop
         return unless @sampled
-        return unless @started
 
-        @result = ::Vernier.stop_profile
-        @started = false
+        @result = ::Vernier.stop_profile if this_vernier_running?
 
         log("Stopped")
-      rescue RuntimeError => e
-        if e.message.include?("Profile not started")
-          log("Not stopped since not started")
-        else
-          log("Failed to stop Vernier: #{e.message}")
-        end
+        # rescue RuntimeError => e
+        # if e.message.include?("Profile not started")
+        #   log("Not stopped since not started")
+        # else
+        #   log("Failed to stop Vernier: #{e.message}")
+        # end
+      ensure
+        @vernier_collector = nil
       end
 
       def active_thread_id
diff --git sentry-ruby/spec/sentry/transactions/profiler_spec.rb sentry-ruby/spec/sentry/transactions/profiler_spec.rb
index 78bb385b..5311fac6 100644
--- sentry-ruby/spec/sentry/transactions/profiler_spec.rb
+++ sentry-ruby/spec/sentry/transactions/profiler_spec.rb
@@ -15,19 +15,16 @@ RSpec.describe Sentry, 'transactions / profiler', when: [:vernier_installed?, :r
   end
 
   it 'starts profiler just once inside nested transactions' do
-    10.times do
+    10000.times do
       parent_transaction = Sentry.start_transaction(name: "parent")
       nested_transaction = Sentry.start_transaction(name: "nested")
 
-      ProfilerTest::Bar.bar
-
-      expect(Sentry.get_current_hub.profiler_running?).to be(true)
-
-      expect(parent_transaction.profiler).to_not be_nil
-      expect(nested_transaction.profiler).to be_nil
+      ProfilerTest::Bar::Foo.foo(10)
 
       nested_transaction.finish
       parent_transaction.finish
+
+      expect(parent_transaction.profiler.to_hash).to_not be_empty
     end
   end
 end
diff --git sentry-ruby/spec/support/profiler.rb sentry-ruby/spec/support/profiler.rb
index ee83cc4d..2a8803a1 100644
--- sentry-ruby/spec/support/profiler.rb
+++ sentry-ruby/spec/support/profiler.rb
@@ -3,8 +3,8 @@
 module ProfilerTest
   module Bar
     module Foo
-      def self.foo
-        1e6.to_i.times { 2**2 }
+      def self.foo(count = 1e6.to_i)
+        count.times { 2**2 }
       end
     end

...and it just doesn't work. Once we have more than one profile running and they try to interact with Vernier, things become unstable and we end up with errors.

It passes really consistently when it's handled higher-up in the Hub. I also think it's good that we don't create a profiler when it's not going to be used.

It seems to be the simplest thing that can possibly work. Can we just stick to this solution?

@sl0thentr0py
Copy link
Member

could we ask the Vernier folks what the best way to do this is? Maybe they can expose it officially?

@solnic
Copy link
Collaborator Author

solnic commented Feb 3, 2025

hey @jhawthorn any chance you could chime in and help us out with this? Long story short is that we need to prevent our transactions from starting Vernier more than once, and it turned out to be a challenge. I put some guards in this PR that ensure that we do not create more than one Profiler instance, but we'd prefer if we could handle it in the profiler itself. I noticed there's @collection set when Vernier is running but based on my tests it looks like checking for that variable is not thread-safe/stable.

@solnic solnic force-pushed the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch 2 times, most recently from 5b6c609 to c0d9002 Compare February 14, 2025 11:50
@solnic
Copy link
Collaborator Author

solnic commented Feb 14, 2025

@sl0thentr0py @st0012 I reverted my attempts at solving it at the transaction level and I still think having hub orchestrate this is the simplest thing to do and it works, unlike every other attempt at checking if Vernier is running.

Could you revisit this please? Without fixing this Vernier profiler is not stable so I'd love to get the fix done.

@solnic solnic force-pushed the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch from c0d9002 to b94fb82 Compare February 14, 2025 11:59
@sl0thentr0py
Copy link
Member

and let's please just merge this minimal change, a bigger refactor is not necessary for this very edge case scenario. Hub/profiler etc will change in the future anyway so I don't want us to spend time on refactors.

@solnic solnic merged commit ef2ea49 into master Feb 14, 2025
150 of 151 checks passed
@solnic solnic deleted the solnic/2525-nomethoderror-undefined-method-_stack_table-for-nil branch February 14, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NoMethodError: undefined method '_stack_table' for nil
3 participants