Needed changes for census profiling #2074

beroy · 2024-01-29T03:13:35Z

Including reporting script for detecting performance violations

Issue and/or context:
Currently census code does not go through performance checks

codecov · 2024-01-29T03:17:30Z

Codecov Report

Merging #2074 (5f2c0dd) into main (c5ce519) will decrease coverage by 9.97%.
The diff coverage is n/a.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2074      +/-   ##
==========================================
- Coverage   77.68%   67.72%   -9.97%     
==========================================
  Files         137       51      -86     
  Lines       10680     2528    -8152     
  Branches      214      215       +1     
==========================================
- Hits         8297     1712    -6585     
+ Misses       2297      718    -1579     
- Partials       86       98      +12

Flag	Coverage Δ
libtiledbsoma	`67.72% <ø> (+3.73%)`	⬆️
python	`?`
r	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
python_api	`∅ <ø> (∅)`
libtiledbsoma	`48.85% <ø> (+0.15%)`	⬆️

atolopko-czi · 2024-01-29T13:51:59Z

profiler/data.py

        key2 = data.timestamp

        filename = f"{self.path}/{key}/{key2}.json"
+        if os.path.exists(filename):
+            os.remove(filename)


the following open() will overwrite the file, so is it necessary to remove it first?

I see your point. Unfortunately the API that mount-s3 provide is not fully compatible with POSIX API and only allows writing new files! Unless additional flag is provided both at installation time and in the code. To avoid all that and keeping the code general, I decided to do it this way.

atolopko-czi · 2024-01-29T13:52:42Z

profiler/__main__.py

@@ -1,4 +1,4 @@
-from .profiler import main
+import profiler


did the relative imports cause trouble or is that you wanted to specify the module prefixes for the imported keywords?

No, the relative imports caused trouble actually. I could not figure the exact issue but managed to fix it this way.

Try running the profiler as a module: python -m profiler. If that works, I would reinstate the relative imports

Done and all working great now with relative imports. Thanks.

atolopko-czi · 2024-01-29T14:02:23Z

profile_report.py

+    exec_time[1] = last_two[1].user_time_sec
+
+    if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):
+        raise SystemExit(f"Potential performance degradation detected {exec_time[0]} vs {exec_time[1]}")


If we allow for 10% degradation every time, we'll soon have unacceptable performance. Might we instead alert when a 10% degradation is detected compared to the first benchmark? If that makes sense, perhaps a larger degradation threshold is allowable.

Also, are the timings always for the same input data? If the data changes, we can't meaningfully compare the performance.

I see your point about the 10% and also believe 10% compared to the first performance point makes more sense.

That's very true. I am not really sure how to handle the changes of data. I think that needs some brainstorming. One option is to consider new data as a new benchmark or reset the benchmark data when data changes significantly. However, none of these can be automated. What do you think?

atolopko-czi · 2024-01-29T14:05:56Z

profile_report.py

+
+# Processes the set of previously written logs
+
+threshold = 1.10  # Percent difference


Suggested change

threshold = 1.10 # Percent difference

# The threshold (ratio) of allowable performance degradation between profiling runs

threshold = 1.10

atolopko-czi · 2024-01-29T14:07:20Z

profile_report.py

+    exec_time[0] = last_two[0].user_time_sec
+    exec_time[1] = last_two[1].user_time_sec
+
+    if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):


Would alert on these two conditions separately, with specific messages. The first condition is not a "degradation", of course.

atolopko-czi · 2024-01-29T14:15:36Z

profile_report.py

+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument("benchmark", type=str)


Would rename benchmark to command, if only to keep the terminology consistent with the rest of the Profiler code. Consider the signature: FileBaseProfileDB.find(command: str)

Sounds good

atolopko-czi · 2024-01-29T14:16:50Z

profile_report.py

+
+parser = argparse.ArgumentParser()
+parser.add_argument("benchmark", type=str)
+parser.add_argument("path", type=str)


maybe db-path?

atolopko-czi · 2024-01-29T14:17:19Z

profile_report.py

@@ -0,0 +1,31 @@
+from profiler import data


Add a comment explaining the purpose and usage of the script.

atolopko-czi · 2024-01-29T14:18:18Z

profile_report.py

move this into profiler dir?

Also, there is already a report.py module. Since this script is actually about detecting performance regressions, perhaps name like perf_checker.py

Also, would this be worth moving to the tools/perf_checker dir in the cellxgene-census repo? The decision over what constitutes a performance problem is a policy that matters to the Census (in particular the 10% threshold). This repo doesn't care about that. Potentially, you could wrap this in to a Python method in profiler.py like compare_runs(previous_run_index: int = -2, current_run_index: int = -1). And then just call that from Census "profile checker" code.

As you suggested this script is moved to cellxgen-census repo

atolopko-czi · 2024-01-29T14:19:40Z

profile_report.py

+    exec_time[1] = last_two[1].user_time_sec
+
+    if threshold * float(exec_time[1]) < float(exec_time[0]) or float(exec_time[1]) > threshold * float(exec_time[0]):
+        raise SystemExit(f"Potential performance degradation detected {exec_time[0]} vs {exec_time[1]}")


When a performance problem is detected, is there more information that could be output which is useful to the reader? Maybe just print out both of the ProfileData objects, since that's the probably the very next thing the reader would want to see.

Great suggestion

ebezzi · 2024-02-07T17:19:11Z

profiler/src/profiler/profiler.py

@@ -183,12 +191,13 @@ def main():
            print(
                f"Third profiler {args.prof2} missing output flamegraph file location"
            )
-
+    args.command.split(" ")


This line looks like a no-op.

ebezzi · 2024-02-07T17:19:31Z

profiler/src/profiler/profiler.py

    p_stdout, p_stderr = p.communicate()
    if p1 is not None:
        p1.wait()
    if p2 is not None:
        p2.wait()
+    print("Done profiler run", file=stderr)


Suggested change

print("Done profiler run", file=stderr)

print("Profiler run done", file=stderr)

Including reporting script for detecting performance violations

* Needed changes for census profiling Including reporting script for detecting performance violations * An S3 based Database implementation

* Needed changes for census profiling Including reporting script for detecting performance violations * An S3 based Database implementation Co-authored-by: beroy <[email protected]>

beroy requested review from atolopko-czi and ebezzi January 29, 2024 03:13

atolopko-czi requested changes Jan 29, 2024

View reviewed changes

beroy force-pushed the census_profiler branch 26 times, most recently from 248ca3a to 64c87aa Compare February 2, 2024 00:14

beroy force-pushed the census_profiler branch 10 times, most recently from 4634583 to 8364fde Compare February 5, 2024 18:42

ebezzi reviewed Feb 7, 2024

View reviewed changes

beroy force-pushed the census_profiler branch from 8364fde to 8f57157 Compare February 7, 2024 19:13

beroy force-pushed the census_profiler branch 3 times, most recently from 1a1e320 to ba8db07 Compare March 5, 2024 00:50

ebezzi approved these changes Mar 9, 2024

View reviewed changes

beroy added 2 commits March 8, 2024 16:36

Needed changes for census profiling

3e1c1da

Including reporting script for detecting performance violations

An S3 based Database implementation

5f2c0dd

beroy force-pushed the census_profiler branch from ba8db07 to 5f2c0dd Compare March 9, 2024 00:36

beroy merged commit 1fb6f39 into main Mar 9, 2024
23 checks passed

beroy deleted the census_profiler branch March 9, 2024 00:41

johnkerl added the backport release-1.8 label Mar 13, 2024

github-actions bot pushed a commit that referenced this pull request Mar 13, 2024

Needed changes for census profiling (#2074)

93654b8

* Needed changes for census profiling Including reporting script for detecting performance violations * An S3 based Database implementation

github-actions bot mentioned this pull request Mar 13, 2024

[Backport release-1.8] Needed changes for census profiling #2261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Needed changes for census profiling #2074

Needed changes for census profiling #2074

beroy commented Jan 29, 2024

codecov bot commented Jan 29, 2024 •

edited

Loading

atolopko-czi Jan 29, 2024

beroy Feb 3, 2024 •

edited

Loading

atolopko-czi Jan 29, 2024

beroy Jan 29, 2024

atolopko-czi Jan 30, 2024

beroy Feb 3, 2024

atolopko-czi Jan 29, 2024

beroy Jan 29, 2024

atolopko-czi Jan 29, 2024

atolopko-czi Jan 29, 2024

atolopko-czi Jan 29, 2024

beroy Jan 29, 2024

atolopko-czi Jan 29, 2024

atolopko-czi Jan 29, 2024

atolopko-czi Jan 29, 2024

atolopko-czi Jan 29, 2024 •

edited

Loading

beroy Feb 3, 2024

atolopko-czi Jan 29, 2024

beroy Jan 29, 2024

ebezzi Feb 7, 2024

ebezzi Feb 7, 2024


		# Processes the set of previously written logs

		threshold = 1.10 # Percent difference

	threshold = 1.10 # Percent difference
	# The threshold (ratio) of allowable performance degradation between profiling runs
	threshold = 1.10

	print("Done profiler run", file=stderr)
	print("Profiler run done", file=stderr)

Needed changes for census profiling #2074

Needed changes for census profiling #2074

Conversation

beroy commented Jan 29, 2024

codecov bot commented Jan 29, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

beroy Feb 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atolopko-czi Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 29, 2024 •

edited

Loading

beroy Feb 3, 2024 •

edited

Loading

atolopko-czi Jan 29, 2024 •

edited

Loading