Protocyte is a Python protoc plugin that generates C++20 protobuf code for
freestanding, embedded, or kernel-style environments. The generated C++ avoids
the STL, exceptions, RTTI, iostreams, and implicit global allocation.
This repository contains a mix of human-written and AI-assisted work. Some source code, documentation, and generated artifacts were drafted or produced with the help of AI tools and then reviewed, edited, and accepted by human maintainers.
Because this project generates code intended for downstream use, users should treat all generated output as needing normal engineering review, testing, and validation before production use.
Responsibility for the contents of this repository and its releases remains with the human maintainers and contributors.
Protocyte currently targets proto3 schemas and advertises
FEATURE_PROTO3_OPTIONAL.
Generated code supports:
- Messages and enums, including nested declarations.
- Scalar fields:
double,float,int32,int64,uint32,uint64,sint32,sint64,fixed32,fixed64,sfixed32,sfixed64,bool, and enum-valued fields. string,bytes, message fields,oneof,optional, repeated fields, packed repeated scalars, maps, and recursive message fields.- Fallible deep-copy helpers via
copy_from()andclone(). - Runtime emission under
protocyte/runtime/.... - Optional debug reflection metadata behind
PROTOCYTE_ENABLE_REFLECTION.
The generated merge_from() and serialize() paths delegate scalar wire
parsing and writing to runtime helpers, so per-field generated code stays
smaller while preserving protobuf wire behavior.
proto2files are rejected.- Protobuf Editions are rejected in v1.
- Proto3 extension declarations are not supported.
- Groups are not supported.
protocyte.arraycannot be applied to map fields.
Protocyte's Python package requires Python 3.14 or newer. That applies to
local uv sync development, published wheel and sdist installs, and any CMake
workflow that runs the plugin through Python3_EXECUTABLE.
Install the project and make the virtual environment's script directory
discoverable to protoc:
uv sync
$env:PATH = "$PWD\.venv\Scripts;$env:PATH"On other shells, either activate .venv first or prepend the matching
.venv/bin directory to PATH.
For a ground-zero walkthrough that covers getting protoc, building and
installing the protocyte package, running protoc with the plugin, wiring the
generated files into a CMake target, and setting up automatic regeneration, see
smoke/README.md.
Generate code:
protoc --proto_path=. --protocyte_out=runtime=emit:generated tests/example.protoThe plugin emits:
foo.protocyte.hppfoo.protocyte.cppprotocyte/runtime/runtime.hppwhen runtime emission is enabled
Protocyte supports two CMake consumption modes:
- Source consumption with
FetchContent - Installed-package consumption with
find_package(protocyte CONFIG REQUIRED)
Published GitHub releases contain three different asset types:
protocyte-X.Y.Z-py3-none-any.whl: the Python wheel forprotoc-gen-protocyte. Install it into a Python 3.14+ environment when you want the plugin executable.protocyte-X.Y.Z.tar.gz: the Python source distribution for the same plugin package. It is also a Python 3.14+ artifact, not a CMake install tree.protocyte-X.Y.Z-cmake-prefix.tar.gz: a preinstalled CMake prefix forfind_package(protocyte CONFIG REQUIRED). Unpack it and add the extracted directory toCMAKE_PREFIX_PATH.
The CMake prefix archive includes the CMake files, C++ runtime headers, and
the protocyte Python generator sources, but it does not bundle Python itself.
Any downstream build that calls protocyte_generate(...) or
protocyte_add_proto_library(...) still needs a local Python 3.14+ interpreter
available to CMake through Python3_EXECUTABLE or the normal find_package(Python3)
search path.
For prerelease tags vX.Y.Z-rcN, the Python packaging artifacts use the
normalized version spelling X.Y.ZrcN in the wheel and sdist filenames,
while the CMake prefix archive keeps the Git tag spelling
protocyte-X.Y.Z-rcN-cmake-prefix.tar.gz.
Minimal source-consumption setup:
include(FetchContent)
FetchContent_Declare(
protocyte
GIT_REPOSITORY https://github.com/anthonyprintup/protocyte.git
GIT_TAG vX.Y.Z
)
FetchContent_MakeAvailable(protocyte)
protocyte_add_proto_library(
TARGET demo_proto
ALIAS demo::proto
PROTO_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/proto"
OUT_DIR "${CMAKE_CURRENT_BINARY_DIR}/generated"
DISCOVER
HOSTED_ALLOCATOR
)
add_executable(demo main.cpp)
target_link_libraries(demo PRIVATE demo::proto)Generator options can be forwarded through OPTIONS:
protocyte_add_proto_library(
TARGET demo_proto
ALIAS demo::proto
PROTO_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/proto"
OUT_DIR "${CMAKE_CURRENT_BINARY_DIR}/generated"
DISCOVER
OPTIONS
"clang_format=C:/Program Files/LLVM/bin/clang-format.exe"
"clang_format_config=${CMAKE_SOURCE_DIR}/.clang-format"
)Absolute Windows and POSIX paths are safe to use in OPTIONS.
By default, the protocyte CMake project fetches protobuf when protobuf CMake targets are not already available, then exposes:
protocyte_add_proto_library(...)for the common target-oriented workflowprotocyte_generate(...)as the lower-level codegen primitiveprotocyte::runtimeandprotocyte::runtime_hostedfor reusable runtime linkage
TARGET must be a real CMake target name without ::. ALIAS can use any
valid alias target name; namespaced aliases like demo::proto are recommended
for downstream linkage.
Pin a published release tag for downstream builds instead of tracking main.
You can also install protocyte into a prefix and consume it later with
find_package.
For published releases, use the protocyte-X.Y.Z-cmake-prefix.tar.gz asset
described above, unpack it, and point CMAKE_PREFIX_PATH at the extracted
prefix directory. Do not use the plain protocyte-X.Y.Z.tar.gz sdist here;
that archive is only the Python plugin package source.
Install protocyte:
cmake -S . -B build/protocyte
cmake --install build/protocyte --prefix C:\path\to\protocyte-prefixMinimal consumer setup:
find_package(protocyte CONFIG REQUIRED)
protocyte_add_proto_library(
TARGET demo_proto
ALIAS demo::proto
PROTO_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/proto"
OUT_DIR "${CMAKE_CURRENT_BINARY_DIR}/generated"
DISCOVER
HOSTED_ALLOCATOR
)
add_executable(demo main.cpp)
target_link_libraries(demo PRIVATE demo::proto)Configure the consumer with -DCMAKE_PREFIX_PATH=<prefix> so CMake can find
protocyteConfig.cmake.
The installed CMake package installs:
- the
protocyte_add_proto_library(...)andprotocyte_generate(...)CMake integration - the exported
protocyte::codegen,protocyte::runtime, andprotocyte::runtime_hostedtargets - the protocyte Python sources used by the plugin wrapper
- the reusable C++ runtime headers and targets
protocyte/options.proto
The installed package does not embed Python or protobuf. Consumers that run code generation still need a working Python 3.14+ interpreter, and they either need protobuf/protoc available already or they can opt into the fetch fallback:
set(PROTOCYTE_FETCH_PROTOBUF ON CACHE BOOL "" FORCE)
find_package(protocyte CONFIG REQUIRED)Public CMake variables exposed by the package:
PROTOCYTE_PROTO_DIR: the installed directory that containsprotocyte/options.protoPROTOCYTE_OPTIONS_PROTO: the full path toprotocyte/options.protoPROTOCYTE_PROTOBUF_GIT_TAG: the protobuf revision used whenPROTOCYTE_FETCH_PROTOBUF=ON
protocyte_add_proto_library(...) links generated code against
protocyte::runtime by default, or protocyte::runtime_hosted when
HOSTED_ALLOCATOR is enabled. Use EMIT_RUNTIME only when you explicitly want
the runtime header emitted into the generated output tree instead of reusing
the installed/runtime target.
The full end-to-end examples, including building a static library from generated translation units, are in smoke/README.md, tests/fetchcontent/CMakeLists.txt, and tests/find_package/CMakeLists.txt.
Supported --protocyte_out= parameters:
runtime=emit: emitruntime.hppunderprotocyte/runtime.runtime=emit:<prefix>: emitruntime.hppunder a custom prefix.runtime=omit: do not emit runtime files.runtime_prefix=<path>: override the runtime include/output prefix when runtime emission is enabled.namespace_prefix=<a::b>: prepend additional C++ namespaces around the file package namespace.namespace=<a::b>: accepted as an alias fornamespace_prefix; specify only one of the two names.include_prefix=<path>: prefix includes for imported generated headers.clang_format=<command-or-path>: run an explicitclang-formatexecutable after generation. When specified, launch and formatting failures are reported as plugin errors.clang_format_config=<path>: use an explicit clang-format config file when formatting runs.
Formatting is best-effort by default. If clang-format is on PATH, protocyte
uses it for generated C++ output. If it is not available and no explicit
clang_format=... override is supplied, protocyte still emits generated files
without failing.
CMake users can forward these through the existing OPTIONS argument on
protocyte_generate(...) or protocyte_add_proto_library(...); no dedicated
CMake option is required. Absolute Windows and POSIX paths are safe in
OPTIONS.
Example:
protoc `
--proto_path=. `
--protocyte_out=runtime=emit:vendor/protocyte,namespace_prefix=mycorp::wire,include_prefix=generated:out `
tests/example.protoProtocyte ships custom protobuf options in protocyte/options.proto.
Available extensions:
option (protocyte.package_constant) = { ... };on files.option (protocyte.constant) = { ... };on messages.(protocyte.array) = { max: ... },(protocyte.array) = { expr: ... }, or(protocyte.array) = { ..., fixed: true }on fields.
Custom option extensions must use the parenthesized protobuf extension syntax. This is valid:
bytes sha256 = 1 [(protocyte.array) = { max: 32, fixed: true }];This is not valid protobuf extension syntax:
bytes sha256 = 1 [protocyte.array = { max: 32 }];Package constants are declared as repeated file options and are emitted as
namespace-scope inline constexpr declarations in the generated C++:
option (protocyte.package_constant) = { name: "CAP", u32: 32 };
option (protocyte.package_constant) = { name: "LABEL", str: "pkt" };Package constants can reference other package constants from the same package.
Message constants are declared as repeated message options:
message Packet {
option (protocyte.constant) = { name: "DOUBLE_CAP", u32_expr: "CAP * 2" };
option (protocyte.constant) = { name: "FULL_LABEL", str_expr: "LABEL + \"-frame\"" };
}Constants must set exactly one typed value field. Supported fields are:
boolean,boolean_expri32,i32_expru32,u32_expri64,i64_expru64,u64_exprf32,f32_exprf64,f64_exprstr,str_expr
Constants can be referenced from array.expr. Resolution works:
- Within the current message.
- Through enclosing messages.
- Through package constants from the current package.
- Across messages with qualified root-relative names such as
Outer.Inner.CAPACITY. - Across messages in other packages with fully qualified names such as
my.pkg.Outer.Inner.CAPACITY. - Through package-qualified constants such as
my.pkg.CAPACITY.
Supported expression features:
- Numeric operators:
+,-,*,/,% - Comparisons:
<,<=,>,>= - Equality:
==,!= - Boolean operators:
!,&&,|| - String concatenation:
+ - String helpers:
len(...),substr(...),starts_with(...)
protocyte.array changes storage generation for bounded fields:
- On
bytes, it generates inline bounded byte storage with a mutable size. - On repeated scalar fields, it generates bounded inline array storage.
protocyte.array.fixed tightens that storage:
- On
bytes, it generates fixed-size storage with presence semantics. - On repeated arrays, parse/serialize/size validation allows either zero elements or the exact element count, rather than allowing any count up to the bound.
Examples:
message Digest {
bytes sha256 = 1 [(protocyte.array) = { max: 32, fixed: true }];
}option (protocyte.package_constant) = { name: "CAP", u32: 16 };
message Samples {
option (protocyte.constant) = { name: "DOUBLE_CAP", u32_expr: "CAP * 2" };
repeated int32 values = 1 [(protocyte.array) = { expr: "CAP" }];
repeated uint32 lanes = 2 [(protocyte.array) = { expr: "4", fixed: true }];
}Every generated message is templated on a runtime config:
template <class Config = ::protocyte::DefaultConfig>
struct Message;The default config uses a caller-supplied allocator context. Construction is
non-allocating. Operations that may allocate return ::protocyte::Status or
::protocyte::Result<T>.
protocyte::DefaultConfig::Context ctx{/* allocator */, /* limits */};
auto msg = demo::Sample<>::create(ctx);If you provide a non-default Config, protocyte now treats that as a stricter
public contract:
Config::Contextmust exposeallocator,limits, andrecursion_depth.
This is a source-breaking change for custom configs. Default-config users are unaffected.
Generated messages are move-only. Ordinary C++ copying is deleted because it cannot report allocation failure.
Common generated operations include:
create(ctx)parse(ctx, reader)merge_from(reader)serialize(writer)encoded_size()copy_from(other)clone()- field accessors,
has_*(),set_*(),mutable_*(), andensure_*()where applicable
Generated string field accessors return ::protocyte::Span<const char> by
default. Protocyte does not return std::string_view by default because the
runtime is designed for freestanding and kernel-style builds that avoid
standard-library exception surfaces. std::string_view includes checked APIs
such as at() and some substr() overloads whose standard contract can throw
std::out_of_range; ::protocyte::Span<const char> keeps the default string
view API in Protocyte's no-exceptions runtime surface.
Hosted users who want standard-library interoperability can opt in:
target_compile_definitions(my_target PRIVATE PROTOCYTE_ENABLE_STD_STRING_VIEW=1)When PROTOCYTE_ENABLE_STD_STRING_VIEW is set to a nonzero value, the runtime
includes <string_view> and both ::protocyte::Span<char> / Span<const char>
and ::protocyte::String are implicitly convertible to std::string_view.
Generated immutable string field accessors also return std::string_view under
this opt-in, so hosted code can pass string fields directly to
standard-library APIs such as std::format. Code that does not enable the
option keeps the smaller no-exception Span<const char> accessor surface.
In a Windows kernel driver, one technically possible MSVC/STL-specific escape
hatch is to provide the STL's internal out-of-range throw helper yourself so
std::string_view::at() can link even though exceptions are unavailable. This
should be treated as a last-resort compatibility shim, not as a recommended
Protocyte configuration: any accidental checked access would bugcheck the
system.
#include <ntddk.h>
namespace std {
[[noreturn]] void __cdecl _Xout_of_range(char const*) {
KeBugCheckEx(MANUALLY_INITIATED_CRASH, 'svat', 0, 0, 0);
__assume(0);
}
} // namespace stdPrefer the default ::protocyte::Span<const char> API in kernel and
freestanding builds. It avoids depending on implementation-private STL symbols
and keeps checked string access out of the generated-code runtime surface.
merge_from(reader) commits parsed data per wire field occurrence. If a field
occurrence is malformed, truncated, exceeds a configured limit, or otherwise
fails while it is being read, that field occurrence does not change the visible
message state. Fields that were parsed successfully before the failing
occurrence remain committed, so merge_from() is not whole-message
transactional.
For singular message fields, a later valid occurrence still follows protobuf merge semantics: it merges into the current field value and then replaces the visible field only after the nested occurrence has parsed successfully. Oneof fields switch cases only after the incoming occurrence is fully parsed. Repeated fields and map fields append or insert only fully parsed elements or entries; malformed packed repeated payloads do not append decoded prefix values.
For bounded and fixed bytes storage, generated parsing may use
resize_for_overwrite() on staged scratch storage before the field is
committed. The reader's can_read() preflight only checks whether the
length-delimited payload should be available; if the following read() still
fails, the staged storage is discarded and the visible field remains unchanged.
For example, given this shape:
message Inner {
string name = 1;
repeated int32 values = 2 [packed = true];
}
message Packet {
bytes digest = 1 [(protocyte.array) = { max: 32, fixed: true }];
oneof choice {
int32 code = 2;
string label = 3;
Inner nested_choice = 4;
}
Inner nested = 5;
repeated int32 samples = 6 [packed = true];
map<string, int32> counters = 7;
}The contract is:
- If
digestalready contains 32 bytes and the wire stream later contains field1with a declared length of 32 but only 4 payload bytes available,merge_from()returns an error and the old 32-byte digest remains present and unchanged. - If
choicecurrently holdslabel = "old"and the wire stream contains a malformedcodefield or a truncatednested_choice, the active oneof case remainslabelwith value"old". - If
nestedalready containsname = "old"andvalues = [1], a later validnestedoccurrence containingvalues = [2]commits as protobuf merge semantics require: the visible field becomesname = "old"andvalues = [1, 2]. If that later nested occurrence is truncated, the visible field remainsname = "old"andvalues = [1]. - If
samplesis[7]and a later packed payload decodes the first value before failing on a truncated varint, no prefix values from that malformed payload are appended;samplesremains[7]. - If
counterscontains{"ok": 1}and a later map entry is malformed before the key and value are fully parsed, no partial entry is inserted and existing entries are left alone. - If a stream contains a valid
digestoccurrence followed by a malformedsamplesoccurrence, the validdigeststays committed aftermerge_from()returns the error fromsamples.
The default runtime does not call malloc or new globally. Hosted allocation
helpers are compiled only when PROTOCYTE_ENABLE_HOSTED_ALLOCATOR is defined,
which is intended for tests and examples rather than kernel builds.
The runtime provides:
StatusandResult<T>- allocator-aware vectors, strings/bytes, optionals, boxes, and maps
- bounded byte and array storage helpers
- slice readers and writers
- protobuf tag, varint, fixed-width, skip, scalar parse, and scalar serialize helpers
Reflection tables are emitted only when PROTOCYTE_ENABLE_REFLECTION is
defined. Release builds do not get descriptor pools or dynamic reflection.