Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #395 #414

Merged
merged 10 commits into from
May 25, 2024
Merged

Fix for issue #395 #414

merged 10 commits into from
May 25, 2024

Conversation

okBrian
Copy link
Contributor

@okBrian okBrian commented May 14, 2024

Description

Fixed the clarity issue for the CI benchmark comparison message when comparing a PR and master.

Example:

  • Before
Comparing Bencharks: master/bench-cpu.yaml is x times slower than pr/bench-cpu.yaml.
  • Expected After
Comparing Benchmarks:
    1.5x indicates pr/bench-cpu.yaml is 1.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is faster than master/bench-cpu.yaml).
    0.5x indicates pr/bench-cpu.yaml is 0.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is slower than master/bench-cpu.yaml).

Fixes #395

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Scope

  • This PR comprises a set of related changes with a common goal

How Has This Been Tested?

Not sure how to run the diff benchmark comparison without the github runner

Test Configuration:

Compiled on a

Ubuntu 22.04.4 using Docker
Intel i7-1165G7

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran an Omniperf profile using ./mfc.sh run XXXX --gpu -t simulation --omniperf, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

@sbryngelson
Copy link
Member

sbryngelson commented May 14, 2024

@henryleberre can confirm but I think this might be backwards. lhs -> master and rhs -> PR.

If the variable speedup below is > 1 then master is slower than then PR.

If the variable speedup below is < 1 then PR is slower than master.

At least that's my read of the code...

Code:

MFC/toolchain/mfc/args.py

Lines 135 to 138 in f23fceb

# === BENCH_DIFF ===
add_common_arguments(bench_diff, "t")
bench_diff.add_argument("lhs", metavar="LHS", type=str, help="Path to a benchmark result YAML file.")
bench_diff.add_argument("rhs", metavar="RHS", type=str, help="Path to a benchmark result YAML file.")

./mfc.sh bench_diff master/bench-${{ matrix.device }}.yaml pr/bench-${{ matrix.device }}.yaml

MFC/toolchain/mfc/bench.py

Lines 108 to 122 in 64d9677

for slug in slugs:
lhs_summary = lhs["cases"][slug]["output_summary"]
rhs_summary = rhs["cases"][slug]["output_summary"]
speedups = ['N/A', 'N/A', 'N/A']
for i, target in enumerate(sorted(DEFAULT_TARGETS, key=lambda t: t.runOrder)):
if target.name not in lhs_summary or target.name not in rhs_summary:
continue
speedups[i] = f"{lhs_summary[target.name] / rhs_summary[target.name]:.2f}x"
table.add_row(f"[magenta]{slug}[/magenta]", *speedups)

and your code says:

cons.print(f"[bold]\t1.5x indicates [magenta]{os.path.relpath(ARG('rhs'))}[/magenta] is 1.5-times as fast as [magenta]{os.path.relpath(ARG('lhs'))}[/magenta] (so [magenta]{os.path.relpath(ARG('rhs'))}[/magenta] is faster than [magenta]{os.path.relpath(ARG('lhs'))}[/magenta]).[/bold]")

@henryleberre
Copy link
Member

Let's consider an example:

$ ./mfc.sh bench_diff lhs rhs

where lhs and rhs are YAML files. By the way, "lhs" and "rhs" just stand for "left-hand side" and "right-hand side". We could maybe opt for better names.

Say that for a given example case, you have:

     simulation
lhs:    1.0s
rhs:    0.5s 

So, $\frac{\text{lhs}}{\text{rhs}}=\frac{1.0}{0.5}=2$. So $\frac{\text{lhs}}{\text{rhs}}$ is the speedup going from lhs to rhs. In other words, lhs is what you are comparing rhs to. This implies that the text I had written and the text @okBrian wrote are both correct, although mine was worded in a way that might be confusing. It was like "LHS is X times SLOWER than RHS" rather than "RHS is X times FASTER than LHS" (@okBrian's version).

As a sidenote, there are a lot of cons.write calls that follow each other. These can be joined to be more readable like:

cons.write(f"""\
Line 1
Line 2
""")

@sbryngelson
Copy link
Member

There should just be a line of text above the results that says "numbers < 1 indicate the PR is faster than master, > 1 indicate PR is slower" (if that is actually the case)

@sbryngelson
Copy link
Member

PR merges are on hold until benchmark CI is working (issue #419)

@henryleberre
Copy link
Member

henryleberre commented May 20, 2024

@okBrian Why did you revert with 1e30c18 ?

@okBrian
Copy link
Contributor Author

okBrian commented May 20, 2024

The past few tests have either failed or have the wrong output so I'm just making sure that's not the issue.

@sbryngelson
Copy link
Member

@okBrian please sync your branch with master, which I just merged to (that's why your current CI tests are failing)

@sbryngelson
Copy link
Member

CI benchmarking is failing and it seems related to this PR

@sbryngelson
Copy link
Member

This might need another merge with master

@sbryngelson sbryngelson merged commit d292c29 into MFlowCode:master May 25, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

CI Benchmarking reports opposite results
3 participants