-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve average performance of long task timer for out-of-order stopping #5591
base: main
Are you sure you want to change the base?
improve average performance of long task timer for out-of-order stopping #5591
Conversation
@fogninid Please sign the Contributor License Agreement! Click here to manually synchronize the status of this Pull Request. See the FAQ for frequently asked questions. |
@fogninid Thank you for signing the Contributor License Agreement! |
aba71f6
to
295d518
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request. What you wrote makes sense. Still, I wanted to verify with some JMH benchmarks so we can put some numbers behind it and have them in place for checking any future changes that would affect performance around this. I made #5595 to add JMH benchmarks.
@@ -92,7 +96,9 @@ public DefaultLongTaskTimer(Id id, Clock clock, TimeUnit baseTimeUnit, | |||
@Override | |||
public Sample start() { | |||
SampleImpl sample = new SampleImpl(); | |||
activeTasks.add(sample); | |||
if (!activeTasks.add(sample)) { | |||
throw new IllegalStateException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't throw exceptions when recording metrics, generally. In what case would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a leftover from testing. See the implementation of SampleImpl#compareTo.
I was a bit surprised that the skip-list considers the comparator returning "0" as sufficient test for equality, and does not check Object#equals.
I am intentially leaving equality to be only reference identity (obviously two samples are never the same), but the naive comparator can not distinguish if two samples are created at exactly the same startTime. Putting hashCode in the comparison makes it "almost always" a total order, but it can theoretically still fail for hash collisions.
You can try removing the hashCode check from the comparator, and see some unit tests will fail.
A fail-proof solution would be to use a strictly increasing counter (for example an AtomicLong) instead of the time based one, but it does not look worth to increase the heap memory usage of SampleImpl.
I tend to say it is better to remove the exception that you mention here and accept that some samples can be lost (not accounted) if bothe the monotonic time does not increase and there are hash collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shakuzen how do you suggest me to proceed?
Should I change the DefaultLongTaskTimer.SampleImpl
class to use more heap memory and store an AtomicInteger
to be always correct even in those very rare corner cases?
Or do you have another suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. Now that 1.14 GA is released, I'll get back to reviewing this so we can hopefully get it merged. I'll take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to add an AtomicInteger field or similar; everyone would have to pay for that extra memory cost even though two samples having the same start time in nanoseconds and the same hash code is incredibly rare. I'll try to look into it more tomorrow to see what we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[..] everyone would have to pay [..]
Good point, what about making only the collisions pay for storing more memory?
d8dcd3b
The additional functions and indirect calls should also almost never happen, so the only disadvantage I see would be in the slightly non-obvious code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shakuzen have you thought about which approach you would like for me to implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your approach looks like the best we've come up with so far. I think we can go with that.
Sharing results from my MacBook Pro M1 with the benchmarks in the linked PR with 10,000 active samples and stopping a random sample. As expected, start is slower but the overall time to start and stop on average (with a random sample) is better. Before
After
|
…/stop of many tasks
…/stop of many tasks
295d518
to
44f147a
Compare
return sample; | ||
} | ||
|
||
private int nextNonZeroCounter() { | ||
int nextCount; | ||
while ((nextCount = counter.incrementAndGet()) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this (whole method) is needed. Could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to avoid 0
here because the classes are constructed such that SampleImplCounted(startTime, 0)
compares equal to SampleImpl(startTime)
, so the two would collide again.
This will happen on the 2^32-th collision on the same instance of DefaultLongTaskTimer
, when counter
has wrapped around the whole int range.
If you feel skipping 1 in 2^32 collisions is acceptable compared to having the strange code, I can just remove this method and the relative test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a long way to say I would keep this and the AtomicInteger
but I was thinking if we can eliminate the AtomicInteger
and use System.identityHashCode(this)
instead but identityHashCode
(like hashCode
) does not guarantee uniqueness. UUID might be an alternative but performance might be worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, I really like this.
Not necessarily in this PR but I' also wondering if we should add a JCStress tests on top of #5595.
|
||
} | ||
|
||
class SampleImplCounted extends SampleImpl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we either:
- Copy the
toString
fromSampleImpl
and modify it so that it will saySampleImplCounted
(probably also adding thecounter
)? For thisstartTime
might need to be protected. - Or modify the
toString
inSampleImpl
and get the classname dynamically(this.getClass().getSimpleName()
) and include the counter.
@@ -249,6 +281,31 @@ public String toString() { | |||
+ ", " + "duration(nanos)=" + durationInNanoseconds + ", " + "startTimeNanos=" + startTime + '}'; | |||
} | |||
|
|||
@Override | |||
public int compareTo(DefaultLongTaskTimer.SampleImpl o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called a lot for the same instance, can we add a shortcut?
if (this == o) {
return 0;
}
return sample; | ||
} | ||
|
||
private int nextNonZeroCounter() { | ||
int nextCount; | ||
while ((nextCount = counter.incrementAndGet()) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a long way to say I would keep this and the AtomicInteger
but I was thinking if we can eliminate the AtomicInteger
and use System.identityHashCode(this)
instead but identityHashCode
(like hashCode
) does not guarantee uniqueness. UUID might be an alternative but performance might be worse.
int startCompare = Long.compare(startTime, o.startTime); | ||
if (startCompare == 0) { | ||
return Integer.compare(counter(), o.counter()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting/readability (hopefully) improvement using this
explicitly:
int startCompare = Long.compare(startTime, o.startTime); | |
if (startCompare == 0) { | |
return Integer.compare(counter(), o.counter()); | |
} | |
int startCompare = Long.compare(this.startTime, o.startTime); | |
if (startCompare == 0) { | |
return Integer.compare(this.counter(), o.counter()); | |
} |
} | ||
|
||
@Test | ||
void counterJumpsZeroAndWraps() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into two?
counterShouldSurviveOverflow
counterShouldJumpZero
SampleImpl.class, Assertions::assertThat); | ||
|
||
@Test | ||
void sampleTimestampCollision() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about naming the tests about the behavior we expect?
void sampleTimestampCollision() { | |
void timestampCollisionShouldBeFine() { |
void sampleTimestampCollision() { | ||
final MockClock clock = new MockClock(); | ||
MeterRegistry registry = new SimpleMeterRegistry(SimpleConfig.DEFAULT, clock); | ||
LongTaskTimer t = LongTaskTimer.builder("my.timer").register(registry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we eliminate one-letter variables e.g.: use ltt
or timer
instead here and later in this test class too?
The current implementation of DefaultLongTaskTimer optimizes for O(1) task starting, but performs poorly when stopping tasks that are not at the beginning of its internal queue (the oldest ones).
At the worst case, when calling stop immediately after starting, the stop call is currently expected to require O(N) operations.
Depending on the distribution of task lifetimes, the average case would be O(1) only for applications that stop tasks in exactly the same order as they were started; applications completing out-of-order and with unbiased lifetime would experience O(N) average.
Task stopping should not have any intrinsic difference to starting: both action are expected to be performed on application threads, and for a well-functioning application (that is not leaking of piling-up tasks) every call to start is matched by exactly one call to stop.