Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Needed changes for census profiling #2074

Merged
merged 2 commits into from
Mar 9, 2024
Merged

Needed changes for census profiling #2074

merged 2 commits into from
Mar 9, 2024

Conversation

beroy
Copy link
Collaborator

@beroy beroy commented Jan 29, 2024

Including reporting script for detecting performance violations

Issue and/or context:
Currently census code does not go through performance checks

@beroy beroy requested review from atolopko-czi and ebezzi January 29, 2024 03:13
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Merging #2074 (5f2c0dd) into main (c5ce519) will decrease coverage by 9.97%.
The diff coverage is n/a.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2074      +/-   ##
==========================================
- Coverage   77.68%   67.72%   -9.97%     
==========================================
  Files         137       51      -86     
  Lines       10680     2528    -8152     
  Branches      214      215       +1     
==========================================
- Hits         8297     1712    -6585     
+ Misses       2297      718    -1579     
- Partials       86       98      +12     
Flag Coverage Δ
libtiledbsoma 67.72% <ø> (+3.73%) ⬆️
python ?
r ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api ∅ <ø> (∅)
libtiledbsoma 48.85% <ø> (+0.15%) ⬆️

profiler/data.py Outdated
key2 = data.timestamp

filename = f"{self.path}/{key}/{key2}.json"
if os.path.exists(filename):
os.remove(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following open() will overwrite the file, so is it necessary to remove it first?

Copy link
Collaborator Author

@beroy beroy Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Unfortunately the API that mount-s3 provide is not fully compatible with POSIX API and only allows writing new files! Unless additional flag is provided both at installation time and in the code. To avoid all that and keeping the code general, I decided to do it this way.

@@ -1,4 +1,4 @@
from .profiler import main
import profiler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the relative imports cause trouble or is that you wanted to specify the module prefixes for the imported keywords?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the relative imports caused trouble actually. I could not figure the exact issue but managed to fix it this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running the profiler as a module: python -m profiler. If that works, I would reinstate the relative imports

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and all working great now with relative imports. Thanks.

exec_time[1] = last_two[1].user_time_sec

if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):
raise SystemExit(f"Potential performance degradation detected {exec_time[0]} vs {exec_time[1]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow for 10% degradation every time, we'll soon have unacceptable performance. Might we instead alert when a 10% degradation is detected compared to the first benchmark? If that makes sense, perhaps a larger degradation threshold is allowable.

Also, are the timings always for the same input data? If the data changes, we can't meaningfully compare the performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about the 10% and also believe 10% compared to the first performance point makes more sense.

That's very true. I am not really sure how to handle the changes of data. I think that needs some brainstorming. One option is to consider new data as a new benchmark or reset the benchmark data when data changes significantly. However, none of these can be automated. What do you think?


# Processes the set of previously written logs

threshold = 1.10 # Percent difference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
threshold = 1.10 # Percent difference
# The threshold (ratio) of allowable performance degradation between profiling runs
threshold = 1.10

exec_time[0] = last_two[0].user_time_sec
exec_time[1] = last_two[1].user_time_sec

if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would alert on these two conditions separately, with specific messages. The first condition is not a "degradation", of course.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("benchmark", type=str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename benchmark to command, if only to keep the terminology consistent with the rest of the Profiler code. Consider the signature: FileBaseProfileDB.find(command: str)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good


parser = argparse.ArgumentParser()
parser.add_argument("benchmark", type=str)
parser.add_argument("path", type=str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe db-path?

@@ -0,0 +1,31 @@
from profiler import data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining the purpose and usage of the script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this into profiler dir?

Copy link
Member

@atolopko-czi atolopko-czi Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is already a report.py module. Since this script is actually about detecting performance regressions, perhaps name like perf_checker.py

Also, would this be worth moving to the tools/perf_checker dir in the cellxgene-census repo? The decision over what constitutes a performance problem is a policy that matters to the Census (in particular the 10% threshold). This repo doesn't care about that. Potentially, you could wrap this in to a Python method in profiler.py like compare_runs(previous_run_index: int = -2, current_run_index: int = -1). And then just call that from Census "profile checker" code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested this script is moved to cellxgen-census repo

exec_time[1] = last_two[1].user_time_sec

if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):
raise SystemExit(f"Potential performance degradation detected {exec_time[0]} vs {exec_time[1]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a performance problem is detected, is there more information that could be output which is useful to the reader? Maybe just print out both of the ProfileData objects, since that's the probably the very next thing the reader would want to see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion

@beroy beroy force-pushed the census_profiler branch 26 times, most recently from 248ca3a to 64c87aa Compare February 2, 2024 00:14
@beroy beroy force-pushed the census_profiler branch 10 times, most recently from 4634583 to 8364fde Compare February 5, 2024 18:42
@@ -183,12 +191,13 @@ def main():
print(
f"Third profiler {args.prof2} missing output flamegraph file location"
)

args.command.split(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks like a no-op.

p_stdout, p_stderr = p.communicate()
if p1 is not None:
p1.wait()
if p2 is not None:
p2.wait()
print("Done profiler run", file=stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Done profiler run", file=stderr)
print("Profiler run done", file=stderr)

@beroy beroy force-pushed the census_profiler branch 3 times, most recently from 1a1e320 to ba8db07 Compare March 5, 2024 00:50
beroy added 2 commits March 8, 2024 16:36
Including reporting script for detecting performance violations
@beroy beroy force-pushed the census_profiler branch from ba8db07 to 5f2c0dd Compare March 9, 2024 00:36
@beroy beroy merged commit 1fb6f39 into main Mar 9, 2024
23 checks passed
@beroy beroy deleted the census_profiler branch March 9, 2024 00:41
github-actions bot pushed a commit that referenced this pull request Mar 13, 2024
* Needed changes for census profiling

Including reporting script for detecting performance violations

* An S3 based Database implementation
johnkerl pushed a commit that referenced this pull request Mar 13, 2024
* Needed changes for census profiling

Including reporting script for detecting performance violations

* An S3 based Database implementation

Co-authored-by: beroy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants