Fix for issue #395 #414

okBrian · 2024-05-14T01:18:02Z

Description

Fixed the clarity issue for the CI benchmark comparison message when comparing a PR and master.

Example:

Before

Comparing Bencharks: master/bench-cpu.yaml is x times slower than pr/bench-cpu.yaml.

Expected After

Comparing Benchmarks:
    1.5x indicates pr/bench-cpu.yaml is 1.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is faster than master/bench-cpu.yaml).
    0.5x indicates pr/bench-cpu.yaml is 0.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is slower than master/bench-cpu.yaml).

Fixes #395

Type of change

Bug fix (non-breaking change which fixes an issue)

Scope

This PR comprises a set of related changes with a common goal

How Has This Been Tested?

Not sure how to run the diff benchmark comparison without the github runner

Test Configuration:

Compiled on a

Ubuntu 22.04.4 using Docker
Intel i7-1165G7

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran an Omniperf profile using ./mfc.sh run XXXX --gpu -t simulation --omniperf, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

sbryngelson · 2024-05-14T20:16:18Z

@henryleberre can confirm but I think this might be backwards. lhs -> master and rhs -> PR.

If the variable speedup below is > 1 then master is slower than then PR.

If the variable speedup below is < 1 then PR is slower than master.

At least that's my read of the code...

Code:

MFC/toolchain/mfc/args.py

Lines 135 to 138 in f23fceb

 # === BENCH_DIFF === 

 add_common_arguments(bench_diff, "t") 

 bench_diff.add_argument("lhs", metavar="LHS", type=str, help="Path to a benchmark result YAML file.") 

 bench_diff.add_argument("rhs", metavar="RHS", type=str, help="Path to a benchmark result YAML file.")

MFC/.github/workflows/bench.yml

Line 38 in f23fceb

  ./mfc.sh bench_diff master/bench-${{ matrix.device }}.yaml pr/bench-${{ matrix.device }}.yaml 

MFC/toolchain/mfc/bench.py

Lines 108 to 122 in 64d9677

 
 for slug in slugs: 

 lhs_summary = lhs["cases"][slug]["output_summary"] 

 rhs_summary = rhs["cases"][slug]["output_summary"] 

 speedups = ['N/A', 'N/A', 'N/A'] 

 for i, target in enumerate(sorted(DEFAULT_TARGETS, key=lambda t: t.runOrder)): 

 if target.name not in lhs_summary or target.name not in rhs_summary: 

 continue 

 speedups[i] = f"{lhs_summary[target.name] / rhs_summary[target.name]:.2f}x" 

 table.add_row(f"[magenta]{slug}[/magenta]", *speedups)

and your code says:

cons.print(f"[bold]\t1.5x indicates [magenta]{os.path.relpath(ARG('rhs'))}[/magenta] is 1.5-times as fast as [magenta]{os.path.relpath(ARG('lhs'))}[/magenta] (so [magenta]{os.path.relpath(ARG('rhs'))}[/magenta] is faster than [magenta]{os.path.relpath(ARG('lhs'))}[/magenta]).[/bold]")

henryleberre · 2024-05-15T01:22:18Z

Let's consider an example:

$ ./mfc.sh bench_diff lhs rhs

where lhs and rhs are YAML files. By the way, "lhs" and "rhs" just stand for "left-hand side" and "right-hand side". We could maybe opt for better names.

Say that for a given example case, you have:

     simulation
lhs:    1.0s
rhs:    0.5s

So, $\frac{\text{lhs}}{\text{rhs}}=\frac{1.0}{0.5}=2$. So $\frac{\text{lhs}}{\text{rhs}}$ is the speedup going from lhs to rhs. In other words, lhs is what you are comparing rhs to. This implies that the text I had written and the text @okBrian wrote are both correct, although mine was worded in a way that might be confusing. It was like "LHS is X times SLOWER than RHS" rather than "RHS is X times FASTER than LHS" (@okBrian's version).

As a sidenote, there are a lot of cons.write calls that follow each other. These can be joined to be more readable like:

cons.write(f"""\
Line 1
Line 2
""")

sbryngelson · 2024-05-15T02:27:33Z

There should just be a line of text above the results that says "numbers < 1 indicate the PR is faster than master, > 1 indicate PR is slower" (if that is actually the case)

sbryngelson · 2024-05-16T01:44:53Z

PR merges are on hold until benchmark CI is working (issue #419)

henryleberre · 2024-05-20T22:28:21Z

@okBrian Why did you revert with 1e30c18 ?

okBrian · 2024-05-20T22:42:01Z

The past few tests have either failed or have the wrong output so I'm just making sure that's not the issue.

sbryngelson · 2024-05-22T13:10:27Z

@okBrian please sync your branch with master, which I just merged to (that's why your current CI tests are failing)

sbryngelson · 2024-05-23T01:46:01Z

CI benchmarking is failing and it seems related to this PR

sbryngelson · 2024-05-24T15:11:03Z

This might need another merge with master

okBrian added 2 commits May 13, 2024 23:13

fixed benchmarking message

e9c578a

fixed typo

3757edd

okBrian requested review from sbryngelson and henryleberre as code owners May 14, 2024 01:18

Fixed benchmarking message, and fixed readability of other cons.print

c04bc21

okBrian added 3 commits May 17, 2024 16:30

fixed indentation for multi-line prints

48610a8

test revert on multi line prints

1e30c18

merged with master repo

a946d98

okBrian added 2 commits May 20, 2024 22:03

revert on revert multiline print

5ed4d89

fixed linting error

973f51a

Merge branch 'MFlowCode:master' into master

6ff0e1a

Merge branch 'MFlowCode:master' into master

b495eea

sbryngelson approved these changes May 25, 2024

View reviewed changes

sbryngelson merged commit d292c29 into MFlowCode:master May 25, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for issue #395 #414

Fix for issue #395 #414

okBrian commented May 14, 2024

sbryngelson commented May 14, 2024 •

edited

Loading

henryleberre commented May 15, 2024

sbryngelson commented May 15, 2024

sbryngelson commented May 16, 2024

henryleberre commented May 20, 2024 •

edited

Loading

okBrian commented May 20, 2024

sbryngelson commented May 22, 2024

sbryngelson commented May 23, 2024

sbryngelson commented May 24, 2024

Fix for issue #395 #414

Fix for issue #395 #414

Conversation

okBrian commented May 14, 2024

Description

Example:

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

sbryngelson commented May 14, 2024 • edited Loading

henryleberre commented May 15, 2024

sbryngelson commented May 15, 2024

sbryngelson commented May 16, 2024

henryleberre commented May 20, 2024 • edited Loading

okBrian commented May 20, 2024

sbryngelson commented May 22, 2024

sbryngelson commented May 23, 2024

sbryngelson commented May 24, 2024

If your code changes any code source files (anything in `src/simulation`)

sbryngelson commented May 14, 2024 •

edited

Loading

henryleberre commented May 20, 2024 •

edited

Loading