improve average performance of long task timer for out-of-order stopping #5591

fogninid · 2024-10-14T13:17:14Z

The current implementation of DefaultLongTaskTimer optimizes for O(1) task starting, but performs poorly when stopping tasks that are not at the beginning of its internal queue (the oldest ones).
At the worst case, when calling stop immediately after starting, the stop call is currently expected to require O(N) operations.
Depending on the distribution of task lifetimes, the average case would be O(1) only for applications that stop tasks in exactly the same order as they were started; applications completing out-of-order and with unbiased lifetime would experience O(N) average.

Task stopping should not have any intrinsic difference to starting: both action are expected to be performed on application threads, and for a well-functioning application (that is not leaking of piling-up tasks) every call to start is matched by exactly one call to stop.

pivotal-cla · 2024-10-14T13:17:17Z

@fogninid Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

pivotal-cla · 2024-10-14T13:18:12Z

@fogninid Thank you for signing the Contributor License Agreement!

shakuzen

Thanks for the pull request. What you wrote makes sense. Still, I wanted to verify with some JMH benchmarks so we can put some numbers behind it and have them in place for checking any future changes that would affect performance around this. I made #5595 to add JMH benchmarks.

shakuzen · 2024-10-16T05:23:49Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

@@ -92,7 +96,9 @@ public DefaultLongTaskTimer(Id id, Clock clock, TimeUnit baseTimeUnit,
    @Override
    public Sample start() {
        SampleImpl sample = new SampleImpl();
-        activeTasks.add(sample);
+        if (!activeTasks.add(sample)) {
+            throw new IllegalStateException();


We shouldn't throw exceptions when recording metrics, generally. In what case would this happen?

It is a leftover from testing. See the implementation of SampleImpl#compareTo.
I was a bit surprised that the skip-list considers the comparator returning "0" as sufficient test for equality, and does not check Object#equals.
I am intentially leaving equality to be only reference identity (obviously two samples are never the same), but the naive comparator can not distinguish if two samples are created at exactly the same startTime. Putting hashCode in the comparison makes it "almost always" a total order, but it can theoretically still fail for hash collisions.

You can try removing the hashCode check from the comparator, and see some unit tests will fail.

A fail-proof solution would be to use a strictly increasing counter (for example an AtomicLong) instead of the time based one, but it does not look worth to increase the heap memory usage of SampleImpl.

I tend to say it is better to remove the exception that you mention here and accept that some samples can be lost (not accounted) if bothe the monotonic time does not increase and there are hash collisions.

@shakuzen how do you suggest me to proceed?

Should I change the DefaultLongTaskTimer.SampleImpl class to use more heap memory and store an AtomicInteger to be always correct even in those very rare corner cases?

Or do you have another suggestions?

Sorry for the late response. Now that 1.14 GA is released, I'll get back to reviewing this so we can hopefully get it merged. I'll take a look tomorrow.

I don't think we want to add an AtomicInteger field or similar; everyone would have to pay for that extra memory cost even though two samples having the same start time in nanoseconds and the same hash code is incredibly rare. I'll try to look into it more tomorrow to see what we can do.

[..] everyone would have to pay [..]

Good point, what about making only the collisions pay for storing more memory?
d8dcd3b

The additional functions and indirect calls should also almost never happen, so the only disadvantage I see would be in the slightly non-obvious code

@shakuzen have you thought about which approach you would like for me to implement?

Your approach looks like the best we've come up with so far. I think we can go with that.

shakuzen · 2024-10-16T06:18:19Z

Sharing results from my MacBook Pro M1 with the benchmarks in the linked PR with 10,000 active samples and stopping a random sample. As expected, start is slower but the overall time to start and stop on average (with a random sample) is better.

Before

Benchmark                                   Mode  Cnt   Score   Error  Units
DefaultLongTaskTimerBenchmark.start           ss  200   0.495 ± 0.064  us/op
DefaultLongTaskTimerBenchmark.startAndStop    ss  200  15.351 ± 2.508  us/op
DefaultLongTaskTimerBenchmark.stopRandom      ss  200  14.784 ± 2.584  us/op

After

Benchmark                                   Mode  Cnt  Score   Error  Units
DefaultLongTaskTimerBenchmark.start           ss  200  1.002 ± 0.116  us/op
DefaultLongTaskTimerBenchmark.startAndStop    ss  200  6.338 ± 0.577  us/op
DefaultLongTaskTimerBenchmark.stopRandom      ss  200  5.154 ± 0.608  us/op

…/stop of many tasks

shakuzen · 2025-01-10T08:44:37Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

        return sample;
    }

+    private int nextNonZeroCounter() {
+        int nextCount;
+        while ((nextCount = counter.incrementAndGet()) == 0) {


I'm not sure I understand why this (whole method) is needed. Could you explain?

We should try to avoid 0 here because the classes are constructed such that SampleImplCounted(startTime, 0) compares equal to SampleImpl(startTime), so the two would collide again.

This will happen on the 2^32-th collision on the same instance of DefaultLongTaskTimer, when counter has wrapped around the whole int range.

If you feel skipping 1 in 2^32 collisions is acceptable compared to having the strange code, I can just remove this method and the relative test

This is a long way to say I would keep this and the AtomicInteger but I was thinking if we can eliminate the AtomicInteger and use System.identityHashCode(this) instead but identityHashCode (like hashCode) does not guarantee uniqueness. UUID might be an alternative but performance might be worse.

jonatan-ivanov

Thank you for the PR, I really like this.
Not necessarily in this PR but I' also wondering if we should add a JCStress tests on top of #5595.

jonatan-ivanov · 2025-01-16T23:41:00Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

+
+    }
+
+    class SampleImplCounted extends SampleImpl {


Can we either:

Copy the toString from SampleImpl and modify it so that it will say SampleImplCounted (probably also adding the counter)? For this startTime might need to be protected.

Or modify the toString in SampleImpl and get the classname dynamically(this.getClass().getSimpleName()) and include the counter.

jonatan-ivanov · 2025-01-16T23:42:24Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

@@ -249,6 +281,31 @@ public String toString() {
                    + ", " + "duration(nanos)=" + durationInNanoseconds + ", " + "startTimeNanos=" + startTime + '}';
        }

+        @Override
+        public int compareTo(DefaultLongTaskTimer.SampleImpl o) {


This will be called a lot for the same instance, can we add a shortcut?

if (this == o) { return 0; }

jonatan-ivanov · 2025-01-17T01:21:35Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

        return sample;
    }

+    private int nextNonZeroCounter() {
+        int nextCount;
+        while ((nextCount = counter.incrementAndGet()) == 0) {


This is a long way to say I would keep this and the AtomicInteger but I was thinking if we can eliminate the AtomicInteger and use System.identityHashCode(this) instead but identityHashCode (like hashCode) does not guarantee uniqueness. UUID might be an alternative but performance might be worse.

jonatan-ivanov · 2025-01-17T01:28:46Z

micrometer-core/src/main/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimer.java

+            int startCompare = Long.compare(startTime, o.startTime);
+            if (startCompare == 0) {
+                return Integer.compare(counter(), o.counter());
+            }


Just a formatting/readability (hopefully) improvement using this explicitly:

Suggested change

int startCompare = Long.compare(startTime, o.startTime);

if (startCompare == 0) {

return Integer.compare(counter(), o.counter());

}

int startCompare = Long.compare(this.startTime, o.startTime);

if (startCompare == 0) {

return Integer.compare(this.counter(), o.counter());

}

jonatan-ivanov · 2025-01-17T02:29:27Z

...eter-core/src/test/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimerTest.java

+    }
+
+    @Test
+    void counterJumpsZeroAndWraps() {


Can we split this into two?

counterShouldSurviveOverflow

counterShouldJumpZero

jonatan-ivanov · 2025-01-17T02:31:37Z

...eter-core/src/test/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimerTest.java

+            SampleImpl.class, Assertions::assertThat);
+
+    @Test
+    void sampleTimestampCollision() {


What do you think about naming the tests about the behavior we expect?

Suggested change

void sampleTimestampCollision() {

void timestampCollisionShouldBeFine() {

jonatan-ivanov · 2025-01-17T02:32:40Z

...eter-core/src/test/java/io/micrometer/core/instrument/internal/DefaultLongTaskTimerTest.java

+    void sampleTimestampCollision() {
+        final MockClock clock = new MockClock();
+        MeterRegistry registry = new SimpleMeterRegistry(SimpleConfig.DEFAULT, clock);
+        LongTaskTimer t = LongTaskTimer.builder("my.timer").register(registry);


Can we eliminate one-letter variables e.g.: use ltt or timer instead here and later in this test class too?

fogninid force-pushed the long_task_timer_performance branch 2 times, most recently from aba71f6 to 295d518 Compare October 14, 2024 14:55

shakuzen added enhancement A general enhancement performance Issues related to general performance module: micrometer-core An issue that is related to our core module labels Oct 15, 2024

shakuzen added this to the 1.15.0-M1 milestone Oct 16, 2024

shakuzen mentioned this pull request Oct 16, 2024

Add benchmarks for DefaultLTT start/stop #5595

Merged

shakuzen reviewed Oct 16, 2024

View reviewed changes

fogninid added 2 commits October 16, 2024 11:55

improve average performance of long task timer for out-of-order start…

6b73a9f

…/stop of many tasks

improve average performance of long task timer for out-of-order start…

44f147a

…/stop of many tasks

fogninid force-pushed the long_task_timer_performance branch from 295d518 to 44f147a Compare October 16, 2024 12:05

fogninid added 2 commits November 16, 2024 12:45

improve average performance of long task timer for out-of-order start…

d8dcd3b

…/stop of many tasks

fix missing license header

9396e4d

shakuzen reviewed Jan 10, 2025

View reviewed changes

marcingrzejszczak modified the milestones: 1.15.0-M1, 1.15.0-M2 Jan 13, 2025

jonatan-ivanov reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve average performance of long task timer for out-of-order stopping #5591

improve average performance of long task timer for out-of-order stopping #5591

fogninid commented Oct 14, 2024

pivotal-cla commented Oct 14, 2024

pivotal-cla commented Oct 14, 2024

shakuzen left a comment

shakuzen Oct 16, 2024

fogninid Oct 16, 2024

fogninid Nov 6, 2024

shakuzen Nov 12, 2024

shakuzen Nov 14, 2024

fogninid Nov 16, 2024

fogninid Jan 7, 2025

shakuzen Jan 10, 2025

shakuzen commented Oct 16, 2024

shakuzen Jan 10, 2025

fogninid Jan 10, 2025

jonatan-ivanov Jan 17, 2025

jonatan-ivanov left a comment

jonatan-ivanov Jan 16, 2025

jonatan-ivanov Jan 16, 2025

jonatan-ivanov Jan 17, 2025

jonatan-ivanov Jan 17, 2025

jonatan-ivanov Jan 17, 2025

jonatan-ivanov Jan 17, 2025

jonatan-ivanov Jan 17, 2025

	void sampleTimestampCollision() {
	void timestampCollisionShouldBeFine() {

improve average performance of long task timer for out-of-order stopping #5591

Are you sure you want to change the base?

improve average performance of long task timer for out-of-order stopping #5591

Conversation

fogninid commented Oct 14, 2024

pivotal-cla commented Oct 14, 2024

pivotal-cla commented Oct 14, 2024

shakuzen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shakuzen commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonatan-ivanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment