Here I collect random thoughts and ideas about further PGO investigation.
For now, the list is clear, and I am happy with that. But the article has nearly 100 TODOs...
- What about using PGO during fuzzing? To optimize the fuzzing efficiency but fuzzing explores all paths...
- https://llvm.org/docs/CommandGuide/llvm-profdata.html#cmdoption-llvm-profdata-merge-text - test this functionality and check the generated profile. Add to the article
- Sometimes people aren't interested enough in PGO without benchmarks - oxc-project/oxc#812 (comment)
- Compiler statistics of PGO: oxc-project/oxc#812 (comment)
- Add a chapter about starting LTO/PGO/PLO journey
- Propose PGO to https://git.deuxfleurs.fr/Deuxfleurs/garage - write why proposing to such places is a bit more difficult
- Perform PGO benchmarks on https://github.com/pydantic/jiter - pydantic/jiter#123 + pydantic/jiter#123 (comment)
- Check Rust no_std programs - how do they work with PGO? https://github.com/Amanieu/minicov - here can be some useful info about the topic
- Suggest PGO to https://github.com/phiresky/sqlite-zstd
- Write about next step in PGO advertisment - Go ecosystem and why exactly this PL
- Estimating PGO efficiency without benchmarks - no such tools but some ideas like metrics, ML, questionaries like "Are you Fabris Bellard?"
- https://app.radicle.xyz/nodes/seed.radicle.xyz/rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5 - I don't know how to create an issue for the project
- Check PGO state in https://www.illumos.org/ . Maybe create one more "PGO all the packages" issue at their forum/bug tracker
- ferrumc-rs/ferrumc#15 (comment) - another comment that don't understand PGO's value for applications performance in general
- Add PGO request to https://sourceforge.net/projects/sdcc/ . Build the compiler with PGO and implement PGO for the compiler
- Add PGO request to https://github.com/FalkorDB/FalkorDB/ + https://github.com/FalkorDB/FalkorDB-core-rs
- Golang Conf 2024, a talk about PGO in Go - https://www.youtube.com/watch?v=FyZJlPMBFm8 , talk to the speaker about their PGO way in Go
- Rework wording in Kobzol/cargo-pgo#60 + Kobzol/cargo-pgo#61 - add to the article about influence of such "wrong" docs
- CachyOS optimizes more packages with PGO: https://www.phoronix.com/news/CachyOS-September-2024 - mention Phoronix too in the article
- Add about MLGO for regalloc: https://groups.google.com/g/llvm-dev/c/7jxuvu3WPl0 + https://github.com/google/ml-compiler-opt/blob/main/docs/regalloc-demo/demo.md + https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1494741-cachyos-optimizing-more-packages-with-pgo-for-up-to-~10-better-performance?p=1494782#post1494782 . In the article already there is some information about MLGO - extend it
- Add to the article - https://www.reddit.com/r/cpp/comments/1fsj7a1/comment/lplvd07/
- Add https://pythonspeed.com/ to the article as a meta-idea
- Write about an idea of performance-oriented community and meetup initiatives
- Write about different benchmarks types: micro-bench, end-to-end bench, etc. and how they are suitable for PGO
- Update rinja docs: https://www.reddit.com/r/rust/comments/1ftx5iv/comment/lpy5v01/ - rinja-rs/rinja#188
- Added documentation about PGO to a library: FlixCoder/serde-brief#6
- AutoFDO + Propeller for the Linux kernel: https://www.phoronix.com/news/Linux-AutoFDO-Prop-v2
- AutoFDO regressions between LLVM releases: google/autofdo#181 (comment)
- Proprietary tooling and PGO: https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Boost-Performance-with-Hardware-Counter-Assisted-Profile-Guided/m-p/1635106/highlight/true#M4128
- https://github.com/spiceai/spiceai - suggest PGO
- closed the PGO issue even if they ARE interested in it (but not now): FuelLabs/fuel-core#1369 (comment)
- Write to the author https://www.manning.com/books/latency about LTO, PGO and Pythonspeed resources
- Add LTO and PGO requests for https://github.com/greenbone/openvas-scanner/tree/main/rust (for PGO perform benches)
- Rinja example: PGO optimized e2e bench (mitsuhiko/minijinja#588 (comment)) but failed in microbenchmarks: https://gist.github.com/zamazan4ik/c61ccdb86372bdc1ee7ab305381e2e28
- Another comment about "fixing the code" instead of PGO: optiprism-io/optiprism#292 (comment)
- Write RocksDB as a pretty common multilanguage example for using PGO (e.g. RocksDB in C++, the layer above in anything)
- After several rounds of optimizations PGO efficiency can become less: WGUNDERWOOD/tex-fmt#22 (comment)
- Performance matters talk - https://www.youtube.com/watch?v=r-TLSBdHe1A (also I need to check the StrangeLoop conference)
- Perform PGO benches for https://github.com/tonbo-io/tonbo/tree/main/benches - blocked by tonbo-io/tonbo#185
- Student's work in the PGO-like domains: http://mcst.ru/sites/default/files/u11/Master_2023_Levchenko.pdf (warn: in Russian)
- Sometimes people doesn't want to keep issues open... espruino/Espruino#2392
- Write that PLO optimizers are not a replacement for PGO - it's an addition (also insert a citation from the BOLT paper from "abstract"). Perform some "PGO vs BOLT" + "PGO + BOLT vs PGO" + "BOLT vs PGO + BOLT" on SQLite - add these benches to the article
- A request example of enabling PGO for a tool with PGO support in the upstream
- 3rd party repos for PGO-optimized stuff for more boring projects like Fedora: https://discussion.fedoraproject.org/t/fedora-llvm-team-llvm-pgo-optimized/84361
- OS maintainers have more desire to enable PGO if an upstream project supports it: https://discord.com/channels/862292009423470592/1060577525929103510/1295008340652326934
- Mention Rust performance book in the article
- Add to the article: llvm/llvm-project#57501
- Think about a talk about "LTO, PGO and PLO for Python native-based libs" for Python conferences. Also, the idea can be applicable for other langs like Ruby, etc.
- Suggest PGO to https://github.com/CrunchyData/pg_parquet
- Suggest PGO to https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/runtime/core/munge.cpp
- After some time PGO issues can be closed: apache/horaedb#1051
- Write to Ferrocene support - ferrocene/ferrocene#22 . Add to the article as an example that proprietary companies don't care much about public feedback mechanisms. If I won't get an answer before November 10 - write to [email protected] directly - update the article according to the answer from Ferrocene
- Ask Tinygo (https://github.com/tinygo-org/tinygo) about PGO support in the compiler and for the compiler itself
- Add PGO request to https://docs.yscope.com/clp/main/dev-guide/components-core/
- TikTok videos about PGO, huh? :D
- Update article according to the comment: #8
- PGO profile reproducibility question: https://discourse.llvm.org/t/pgo-profile-reproducibility/82861
- Add https://discussion.fedoraproject.org/t/expand-usage-of-profile-guided-optimization-pgo-and-llvm-bolt-across-fedora-packages/133724/6 and https://kwk.github.io/pgo-experiment/ links to the article
- Add to the article that Google drops GCC support: https://www.mail-archive.com/[email protected]/msg102804.html
- LLVM BOLT and kernel: https://md.archlinux.org/s/maQL1mmOT
- Add https://cachyos.org/blog/2411-kernel-autofdo/ to the article
- CachyOS optimizations: https://discord.com/channels/862292009423470592/873309651364610118/1305227583696404510
- Lightweight PGO: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 . Was mentioned in https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/InstrProfileFormat.html
- Add my PGO slides to the awesome PGO repo
- Add PGO request to https://github.com/heroseh/hcc
- LLVM-based projects don't enable PGO even for already PGO-enabled things: https://github.com/pizlonator/llvm-project-deluge
- https://documentation.ubuntu.com/server/explanation/performance/perf-pgo/
- Run PGO benches here: https://github.com/nurmohammed840/nio/tree/main/benchmarks
- data-driven schedulers for preventing problems like this: sched-ext/scx#376
- Manual inlining stuff here and there: FyroxEngine/Fyrox#192 (comment)
- Perform PGO benchmarks for Mistral: https://github.com/EricLBuehler/mistral.rs/tree/master/mistralrs-bench
- https://lwn.net/Articles/993828/ - Linux kernel with BOLT
- AutoFDO removes GCC support: https://www.mail-archive.com/[email protected]/msg102858.html
- https://github.com/orgs/tinygo-org/discussions/4643 - tinygo and PGO
- Link all PGO slides to awesome-pgo (people are asking for that at conferences)
- Sharing PGO profiles between distributions: https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1509316-clang-autofdo-propeller-optimization-support-merged-for-linux-6-13?p=1509355#post1509355
- Add PGO request to https://github.com/orioledb/orioledb
- An example of optimizing software manually (rust-lang/rust-analyzer#17491) instead of doing PGO: rust-lang/rust-analyzer#9412
- Add PGO request: fortress-build/nervemq#35
- Tell people more what PGO does in practice: tursodatabase/limbo#78 (comment)
- Add PGO request: https://github.com/carthage-software/mago
- Good optimize/overhead decision: https://github.com/DioxusLabs/dioxus/blob/main/Cargo.toml#L252
- Improvements from LTO were not huge: ricott1/rebels-in-the-sky#24 (comment)
- Nikita Popov's talk about "Rust loves LLVM": https://www.youtube.com/watch?v=Kqz-umsAnk8
- https://github.com/floooh/fips/blob/3d05e74bc2f07a0b31138eed795e0a7d0368f753/CHANGELOG.md?plain=1#L194 - LTO enabled by default for VS projects in some build systems too
- People forget to enable a dedicated Release profile: Asurar0/mikomikagi#2 (comment)
- RustWeek talk idea: "PGO future in Rust - possible ways"
- An additional source of uncertainity possibly due to PGO: rust-lang/rustc-perf#1592 that limits local reproduction (that's why PGO profiles should be public as well - write an idea about this)
- PGO for data structures could prevent this in theory but we not there yet: https://github.com/rust-lang/rustc_codegen_cranelift/commit/5d516f9e118d6527947ca5deb3d76bbc4fa0f8a1
- https://developers.redhat.com/articles/2023/11/07/how-i-experimented-pgo-enabled-llvm-fedora
- Propeller integration in CachyOS for Linux kernel: CachyOS/linux-cachyos#359
- People are asking for PGO/PLO optimized binaries: https://www.reddit.com/r/cpp/comments/1hjhcwr/are_there_any_prebuilt_pgoboltoptimized_version/
- A person never used PGO (it's written in the article) - I am here to change it: https://deterministic.space/high-performance-rust.html
- Add another Ruby compiler: https://blog.llvm.org/posts/2024-12-03-minimalistic-ruby-compiler/ - AoT way is still important nowadays!
- A nice article about compiler myths: https://sbaziotis.com/compilers/common-misconceptions-about-compilers.html
- Create a discussion about PGO in https://github.com/DragonRuby/lightstorm/discussions
- RAM tricks for BOLT: llvm/llvm-project#61711 (comment)
- Add PGO request to https://github.com/mila411/pilgrimage
- PGO write profile and timeout: https://github.com/chimera-linux/cports/commit/81dd2a368e39993c2cb12d5be8ec06f1b7663908
- PGO network access can be a problem: https://github.com/chimera-linux/cports/blob/d8e3510901aca1c3843edaf7b3f33d62ed598a55/main/thunderbird/template.py#L171
- An example of how PGO could be enabled in OS package scripts: https://github.com/chimera-linux/cports/commit/3fdfa1a46ee61ca6fcb6d7a9573e49302309e247
- An example of less popular OS: https://aros.sourceforge.io/ where PGO won't be enabled soon (since PGO has pretty high entrance level right now)
- If documentation contains instructions about PGO usage it doesn't mean that PGO improves the project: vozlt/nginx-module-vts#288 (comment)
- Learned (data-driven!) performance models for compilers: https://proceedings.mlsys.org/paper_files/paper/2021/file/6bcfac823d40046dca25ef6d6d59cc3f-Paper.pdf
- Add https://reviews.llvm.org/D54175 to the article to CS PGO part
- Add https://research.facebook.com/publications/vespa-static-profiling-for-binary-optimization/ to the article
- Add https://storage.googleapis.com/pub-tools-public-publication-data/pdf/578a590c3d797cd5d3fcd98f39657819997d9932.pdf to the article
- Add rustc: supports, but marked unstable: commit, unstable book to the article
- Write about -fdebug-info-for-profiling and -funique-internal-linkage-names: https://reviews.llvm.org/D25435 + https://reviews.llvm.org/D73307
- "Note the use of the -b flag. This tells Perf to use the Last Branch Record (LBR) to record call chains. While this is not strictly required, it provides better call information, which improves the accuracy of the profile data." - write that it's not clear how important the
-b
flag is in practice - Add https://postgrespro.com/list/thread-id/2634776 to the repo
- Add https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf
- LLVM Profi: https://reviews.llvm.org/D109860?id=
- https://arxiv.org/abs/2202.06728 - add to the article
- https://israelo.io/blog/pgo/ - add to the article
- https://engineering.grab.com/profile-guided-optimisation - add to the README
- https://theyahya.com/posts/go-pgo/ - add to the article
- https://blazinglyfast.net/ - performance challenges is a good thing to have!
- Temporal PGO materials: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068/8?u=zamazan4ik
- gcov-tool bugs about PGO profiles: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117090 + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110678 + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118607
- Consumes too much RAM? Just add more swap :))) google/autofdo#162 (comment)