Skip to content

LLVM (Low Level Virtual Machine) Guide. Learn all about the compiler infrastructure, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs. Originally implemented for C/C++ , though, has a variety of front-ends, including Java, Python, etc.

Notifications You must be signed in to change notification settings

mikeroyal/LLVM-Guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation


LLVM Guide

followers

Maintenance Last-Commit

A guide covering LLVM including the applications, libraries and tools that will make you a better and more efficient LLVM development.

Note: You can easily convert this markdown file to a PDF in VSCode using this handy extension Markdown PDF.


Table of Contents

  1. Getting Started with LLVM

  2. LLVM Tools, Libraries and Frameworks

  3. C/C++ Development

  4. Assembly Development

Getting Started with LLVM

Back to the Top

LLVM is a library that has collection of modular/reusable compiler and toolchain components (assemblers, compilers, and debuggers). With these components LLVM can be used as a compiler framework, providing a front-end(parser and lexer) and a back-end (code that converts LLVM's representation to actual machine code).

  • LLVM-based compiler: This is a compiler built partially or completely with the LLVM infrastructure. For example, a compiler might use LLVM for the frontend and backend but use GCC and GNU system libraries to perform the final link.

  • LLVM libraries: This is the reusable code portion of the LLVM infrastructure.

  • LLVM core: The optimizations that happen at the intermediate language level and the backend algorithms form the LLVM core where the project started.

  • Clang is a language front-end and tooling infrastructure for languages in the C language family (C, C++, Objective C/C++, OpenCL, CUDA, and RenderScript) for the LLVM project.

Developer Resources

Books

Back to the Top

YouTube Tutorials

Back to the Top

2023 LLVM Dev Mtg - A Tour of ADT - the LLVM Developer's Toolbox 2023 LLVM Dev Mtg - MLIR Is Not an ML Compiler, and Other Common Misconceptions 2023 LLVM Dev Mtg - Generalized Mem2Reg for MLIR and how to use it LLVM in 100 Seconds Programming Language with LLVM [1/20] Introduction to LLVM IR and tools Bjarne Stroustrup: C++ Implementations - Clang, GCC, Microsoft, and EDG Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21 How to build a compiler with LLVM and MLIR Introduction to the Low-Level Virtual Machine (LLVM) 2023 EuroLLVM - Tutorial: A whirlwind tour of the LLVM optimizer C++ vs Rust: which is faster? Making LLVM RISC-V Compiler More Performant for Everyone - Ivan Baev

LLVM Tools, Libraries and Frameworks

Back to the Top

Visual Studio Code is a code editor redefined and optimized for building and debugging modern web and cloud applications.

Code Server is a tool that allows you to run VS Code on any machine anywhere and access it in the browser.

Clang-Format is a tool to format C/C++/Java/JavaScript/Objective-C/Objective-C++/Protobuf code.

Clang-Tidy is a clang-based C++ "linter" tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis. clang-tidy is modular and provides a convenient interface for writing new checks.

Clangd is a Visual Studio Code extension that provides C/C++ language IDE features for VS Code using clangd.

LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. It also provides features that are useful for toolchain developers. The linker supports ELF (Unix), PE/COFF (Windows), Mach-O (macOS) and WebAssembly in descending order.

TinyGo is a Go compiler(based on LLVM) intended for use in small places such as microcontrollers, WebAssembly (Wasm), and command-line tools.

Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 10-100x or more, on a single thread.

oneAPI DPC++ compiler is a LLVM-based compiler project that implements compiler and runtime support for the SYCL* language.

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development.

Emu is a GPGPU library for Rust with a focus on portability, modularity, and performance. It's a CUDA-esque compute-specific abstraction over WebGPU providing specific functionality to make WebGPU feel more like CUDA.

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than many other profilers while delivering far more detailed information.

McSema is an executable lifter. It translates ("lifts") x86, amd64, aarch64, sparc32, and sparc64 executable binaries from native machine code to LLVM bitcode.

QuarkslaB Dynamic binary Instrumentation (QBDI) is a modular, cross-platform and cross-architecture DBI framework. It aims to support Linux, macOS, Android, iOS and Windows operating systems running on x86, x86-64, ARM and AArch64 architectures.

FileCheck is a flexible pattern matching file verifier.

tblgen is a description to C++ Code.

clang-tblgen is a description to C++ Code for Clang.

lldb-tblgen is a description to C++ Code for LLDB.

llvm-tblgen is a target description to C++ Code for LLVM.

mlir-tblgen is a description to C++ Code for MLIR.

lit is a LLVM Integrated Tester.

llvm-exegesis is a LLVM Machine Instruction Benchmark.

llvm-locstats is a calculate statistics on DWARF debug location.

llvm-pdbutil is a PDB File forensics and diagnostics.

llvm-profgen is a LLVM SPGO profile generation tool

bugpoint is a automatic test case reduction tool.

llvm-extract is a extract a function from an LLVM module.

llvm-bcanalyzer is a LLVM bitcode analyzer.

llvm-addr2line is a drop-in replacement for addr2line.

llvm-ar is a LLVM archiver.

llvm-cxxfilt is a LLVM symbol name demangler.

llvm-install-name-tool is a LLVM tool for manipulating install-names and rpaths.

llvm-nm is a list LLVM bitcode and object file’s symbol table.

llvm-objcopy is a object copying and editing tool.

llvm-objdump is a LLVM’s object file dumper.

llvm-ranlib is a generates an archive index.

llvm-readelf is a GNU-style LLVM Object Reader.

llvm-size is a print size information.

llvm-strings is a print strings.

llvm-strip is a object stripping tool.

C/C++ Development

Back to the Top


C/C++ Learning Resources

C++ is a cross-platform language that can be used to build high-performance applications developed by Bjarne Stroustrup, as an extension to the C language.

C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell Labs. It supports structured programming, lexical variable scope, and recursion, with a static type system. C also provides constructs that map efficiently to typical machine instructions, which makes it one was of the most widely used programming languages today.

Embedded C is a set of language extensions for the C programming language by the C Standards Committee to address issues that exist between C extensions for different embedded systems. The extensions hep enhance microprocessor features such as fixed-point arithmetic, multiple distinct memory banks, and basic I/O operations. This makes Embedded C the most popular embedded software language in the world.

C & C++ Developer Tools from JetBrains

Open source C++ libraries on cppreference.com

C++ Graphics libraries

C++ Libraries in MATLAB

C++ Tools and Libraries Articles

Google C++ Style Guide

Introduction C++ Education course on Google Developers

C++ style guide for Fuchsia

C and C++ Coding Style Guide by OpenTitan

Chromium C++ Style Guide

C++ Core Guidelines

C++ Style Guide for ROS

Learn C++

Learn C : An Interactive C Tutorial

C++ Institute

C++ Online Training Courses on LinkedIn Learning

C++ Tutorials on W3Schools

Learn C Programming Online Courses on edX

Learn C++ with Online Courses on edX

Learn C++ on Codecademy

Coding for Everyone: C and C++ course on Coursera

C++ For C Programmers on Coursera

Top C Courses on Coursera

C++ Online Courses on Udemy

Top C Courses on Udemy

Basics of Embedded C Programming for Beginners on Udemy

C++ For Programmers Course on Udacity

C++ Fundamentals Course on Pluralsight

Introduction to C++ on MIT Free Online Course Materials

Introduction to C++ for Programmers | Harvard

Online C Courses | Harvard University

C/C++ Tools and Frameworks

AWS SDK for C++

Azure SDK for C++

Azure SDK for C

C++ Client Libraries for Google Cloud Services

Visual Studio is an integrated development environment (IDE) from Microsoft; which is a feature-rich application that can be used for many aspects of software development. Visual Studio makes it easy to edit, debug, build, and publish your app. By using Microsoft software development platforms such as Windows API, Windows Forms, Windows Presentation Foundation, and Windows Store.

Visual Studio Code is a code editor redefined and optimized for building and debugging modern web and cloud applications.

Vcpkg is a C++ Library Manager for Windows, Linux, and MacOS.

ReSharper C++ is a Visual Studio Extension for C++ developers developed by JetBrains.

AppCode is constantly monitoring the quality of your code. It warns you of errors and smells and suggests quick-fixes to resolve them automatically. AppCode provides lots of code inspections for Objective-C, Swift, C/C++, and a number of code inspections for other supported languages. All code inspections are run on the fly.

CLion is a cross-platform IDE for C and C++ developers developed by JetBrains.

Code::Blocks is a free C/C++ and Fortran IDE built to meet the most demanding needs of its users. It is designed to be very extensible and fully configurable. Built around a plugin framework, Code::Blocks can be extended with plugins.

CppSharp is a tool and set of libraries which facilitates the usage of native C/C++ code with the .NET ecosystem. It consumes C/C++ header and library files and generates the necessary glue code to surface the native API as a managed API. Such an API can be used to consume an existing native library in your managed code or add managed scripting support to a native codebase.

Conan is an Open Source Package Manager for C++ development and dependency management into the 21st century and on par with the other development ecosystems.

High Performance Computing (HPC) SDK is a comprehensive toolbox for GPU accelerating HPC modeling and simulation applications. It includes the C, C++, and Fortran compilers, libraries, and analysis tools necessary for developing HPC applications on the NVIDIA platform.

Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies such as CUDA, TBB, and OpenMP integrates with existing software.

Boost is an educational opportunity focused on cutting-edge C++. Boost has been a participant in the annual Google Summer of Code since 2007, in which students develop their skills by working on Boost Library development.

Automake is a tool for automatically generating Makefile.in files compliant with the GNU Coding Standards. Automake requires the use of GNU Autoconf.

Cmake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice.

GDB is a debugger, that allows you to see what is going on `inside' another program while it executes or what another program was doing at the moment it crashed.

GCC is a compiler Collection that includes front ends for C, C++, Objective-C, Fortran, Ada, Go, and D, as well as libraries for these languages.

GSL is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.

OpenGL Extension Wrangler Library (GLEW) is a cross-platform open-source C/C++ extension loading library. GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform.

Libtool is a generic library support script that hides the complexity of using shared libraries behind a consistent, portable interface. To use Libtool, add the new generic library building commands to your Makefile, Makefile.in, or Makefile.am.

Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

TAU (Tuning And Analysis Utilities) is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces.

Clang is a production quality C, Objective-C, C++ and Objective-C++ compiler when targeting X86-32, X86-64, and ARM (other targets may have caveats, but are usually easy to fix). Clang is used in production to build performance-critical software like Google Chrome or Firefox.

OpenCV is a highly optimized library with focus on real-time applications. Cross-Platform C++, Python and Java interfaces support Linux, MacOS, Windows, iOS, and Android.

Libcu++ is the NVIDIA C++ Standard Library for your entire system. It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface that makes it easy to respond to the recognition of phrases of interest.

Oat++ is a light and powerful C++ web framework for highly scalable and resource-efficient web application. It's zero-dependency and easy-portable.

JavaCPP is a program that provides efficient access to native C++ inside Java, not unlike the way some C/C++ compilers interact with assembly language.

Cython is a language that makes writing C extensions for Python as easy as Python itself. Cython is based on Pyrex, but supports more cutting edge functionality and optimizations such as calling C functions and declaring C types on variables and class attributes.

Spdlog is a very fast, header-only/compiled, C++ logging library.

Infer is a static analysis tool for Java, C++, Objective-C, and C. Infer is written in OCaml.

Assembly Development

Back to the Top


Assembly Learning Resources

Assembly is a low-level programming language. It uses mnemonic codes and labels to represent machine-level code with each instruction corresponding to just one machine operation.

RISC-V Foundation is a non-profit corporation controlled by its 500 members(NVIDIA, Google, Samsung, Raspberry Pi, SiFive, Canonical, and Western Digital) to drive forward the adoption and implementation of the free and open RISC-V instruction set architecture (ISA).

Intel® 64 and IA-32 Architectures Software Developer’s Manual

Introduction to x64 Assembly from Intel

x86 Assembly Language Reference Manual for Open Solaris

AMD64 Architecture Programmer’s Manual Volume 1-5

AMD GPU ISA documentation

AMD Developer Guides, Manuals, and ISA Documents

Assembler language from IBM

The Assembler language on z/OS from IBM

MIPS Architecture & Technology from Wave Computing

Assemblies in .NET

Microsoft Macro Assembler reference

Compiler Intrinsics and Assembly Language from Microsoft

x86 and amd64 instruction Reference

Intro to x86 Assembly Language Programming

Learn Assembly Programming courses on Udemy

Assembly Languages and Assemblers courses on Coursera

Intro to Assembly Language from MIT

Assembly Tools & Architectures

Arm Instruction Emulator (ArmIE) is a tool that emulates Scalable Vector Extension (SVE) and SVE2 instructions on AArch64/ARM64 platforms.

FASM (flat assembler) is an assembler for x86 processors that supports Intel-based assembly language on the IA-32 and x86-64 computer architectures.

Microsoft Assembler (MASM) for x64 is Microsoft's assembler that accepts x64 assembler language.

MASM/TASM is a VSCode extension that offers a way to run and debug DOS(80x86) assembly TASM/MASM through DOSBox and msdos-player.

NASM is an asssembler/disassembler for the x86 CPU architecture portable to nearly every modern platform, and with code generation for many platforms old and new.

GAS is the assembler used by the GNU Project for the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel.

MIPS is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Technologies, Inc.. In June 2018 MIPS was Acquired by AI Startup Wave Computing.

LLVM is a library that has collection of modular/reusable compiler and toolchain components (assemblers, compilers, debuggers, etc.). With these components LLVM can be used as a compiler framework, providing a front-end(parser and lexer) and a back-end (code that converts LLVM's representation to actual machine code).

TinyGo is a Go compiler(based on LLVM) intended for use in small places such as microcontrollers, WebAssembly (Wasm), and command-line tools.

Tock is an embedded operating system designed for running multiple concurrent, mutually distrustful applications on Cortex-M and RISC-V based embedded platforms. Tock's design centers around protection, both from potentially malicious applications and from device drivers.

PlatformIO is a professional collaborative platform for embedded development with no vendor lock-in. It provides support for multiplatforms and frameworks such as IoT, Arduino, CMSIS, ESP-IDF, FreeRTOS, libOpenCM3, mbed OS, Pulp OS, SPL, STM32Cube, Zephyr RTOS, ARM, AVR, Espressif (ESP8266/ESP32), FPGA, MCS-51 (8051), MSP430, Nordic (nRF51/nRF52), NXP i.MX RT, PIC32, RISC-V.

PlatformIO for VSCode is a plugin that provides support for the PlatformIO IDE on VSCode.

Keystone is a lightweight multi-platform, multi-architecture(Arm, Arm64, Hexagon, Mips, PowerPC, Sparc, SystemZ & X86) assembler framework.

Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework(ARM, AArch64, M68K, Mips, Sparc, X86) based on QEMU.

Contribute

  • If would you like to contribute to this guide simply make a Pull Request.

License

Back to the Top

Distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) Public License.

About

LLVM (Low Level Virtual Machine) Guide. Learn all about the compiler infrastructure, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs. Originally implemented for C/C++ , though, has a variety of front-ends, including Java, Python, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages