All notable changes to the 5-Spot project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Author: Unknown
scripts/install-cloud-init.sh: Linux-only script to convert VMDK→raw, mount LVM with conflict-safe handling, chroot to installcloud-initandopen-vm-tools, optional initramfs rebuild, raw→streamOptimized VMDK, and import as vSphere template viagovc.
Enable automated preparation and deployment of a cloud-init-enabled RHEL image on a VMware VM. Credentials and vSphere target configuration are provided via environment variables to avoid storing secrets in code.
- Breaking change
- Requires cluster rollout
- Config change only
- Documentation only
Author: Unknown
scripts/install-cloud-init.sh: Replaced fragilegovc vm.info-based existence check with robustgovc find -type m -name <name>logic; iterates over matched inventory paths, converts templates to VMs when needed, and destroys them before import.
govc vm.info can return exit code 0 with no output, leading to false positives. Using govc find and inspecting inventory paths provides reliable detection of existing VMs/templates with the target name and avoids confusing "not found" errors.
- Breaking change
- Requires cluster rollout
- Config change only
- Documentation only
Author: Unknown
scripts/install-cloud-init.sh: UseLVM_SYSTEM_DIRto isolate loop device LVM metadata to a separate directory (/tmp/lvm-loop-$$); use temporary VG name (vg00_loop) if host has same VG name to avoid device-mapper conflicts in/dev/mapper/.
Device-mapper device names in /dev/mapper/ are global at the kernel level, even with isolated LVM metadata via LVM_SYSTEM_DIR. If both host and loop device have vg00 with LVs named root, var, etc., device-mapper refuses to create duplicate devices ("Device or resource busy"). By using vgimportclone -n vg00_loop when a conflict exists, we give the loop device VG a unique name for device-mapper while keeping metadata isolated. No rename needed after deactivation since the isolated metadata directory is simply deleted.
- Breaking change
- Requires cluster rollout
- Config change only
- Documentation only
Author: Erick Bourgeois
-
src/crd.rs: BREAKING - Removedversionfield from MachineSpec- Kubernetes version is a cluster-level concern, not machine-level
- Version is defined by bootstrap/infrastructure refs (KubeadmConfigTemplate, etc.)
- Aligns with CAPI conventions where Machines inherit version from templates
-
src/reconcilers/helpers.rs: Removed version from Machine creation logic- No longer passes version to CAPI Machine resource
- Version is determined by the bootstrap configuration
-
examples/*.yaml: Removed version field from all examples -
Test files: Updated all MachineSpec initializations
The Kubernetes version is not a property of individual machines in CAPI. It's defined at:
- Cluster level (Cluster resource)
- ControlPlane level (KubeadmControlPlane)
- Bootstrap template level (KubeadmConfigTemplate)
Having version in MachineSpec created a conceptual mismatch with CAPI architecture and could lead to version conflicts. Machines should inherit version information from their bootstrap and infrastructure references.
- Breaking change: CRD schema updated (version field removed)
- Users should specify K8s version in their bootstrap templates, not in ScheduledMachine
- Aligns with standard CAPI patterns
Author: Erick Bourgeois
-
src/crd.rs: Added configurable Kubernetes version field to MachineSpec- Added
versionfield with default "v1.28.0" - Allows users to specify K8s version per machine instead of hardcoding
- Added
-
src/reconcilers/helpers.rs: Code quality and maintainability improvements- Removed
#[allow(dead_code)]fromresolve_file_contents()andResolvedFile(now actively used) - Extracted hardcoded CAPI strings to global constants for consistency
- Uses
CAPI_GROUP,CAPI_MACHINE_API_VERSION_FULL,CAPI_CLUSTER_NAME_LABEL, etc.
- Removed
-
src/constants.rs: Added CAPI-specific constantsCAPI_GROUP: "cluster.x-k8s.io"CAPI_MACHINE_API_VERSION: "v1beta1"CAPI_MACHINE_API_VERSION_FULL: "cluster.x-k8s.io/v1beta1"CAPI_CLUSTER_NAME_LABEL: "cluster.x-k8s.io/cluster-name"CAPI_RESOURCE_MACHINES: "machines"API_VERSION_FULL: "5spot.finos.org/v1alpha1"
-
src/main.rs: Implemented actual Prometheus metrics- Replaced stub metrics endpoint with proper
prometheus::gather()integration - Returns all registered Prometheus metrics in standard text format
- Added error handling for metric encoding failures
- Replaced stub metrics endpoint with proper
-
examples/*.yaml: Updated examples with version fieldscheduledmachine-basic.yaml: version: v1.28.0scheduledmachine-weekend.yaml: version: v1.29.0
-
All test files: Updated MachineSpec initializations to include version field
Addresses critical TODOs from project guidelines:
- Global Constants: Eliminates magic strings/numbers per "no magic numbers" rule
- Configuration: Makes K8s version user-configurable instead of hardcoded
- Metrics: Provides proper observability via Prometheus
- Code Cleanliness: Removes dead code annotations from actively used functions
- Breaking change: CRD schema updated (version field added with default)
- Users can now specify Kubernetes version per machine
- Better code maintainability with single source of truth for strings
- Prometheus metrics now available at /metrics endpoint
Author: Erick Bourgeois
-
src/reconcilers/helpers.rs: Implemented full CAPI Machine lifecycle integration- Added
validate_references(): Validates bootstrap and infrastructure refs exist before Machine creation - Implemented
add_machine_to_cluster(): Creates cluster.x-k8s.io/v1beta1 Machine resources with:- File content resolution from ConfigMaps/Secrets
- Bash script generation for userData field
- Owner references linking Machine to ScheduledMachine
- Cluster labels for CAPI compliance
- Implemented
remove_machine_from_cluster(): Deletes CAPI Machine resources with 404 handling - Uses kube::core::DynamicObject with kube::discovery::ApiResource for dynamic CAPI resource access
- Added
-
src/reconcilers/scheduled_machine.rs: Updated phase handlers to use CAPI functions- Modified
handle_pending_phase(): Calls validate_references before add_machine_to_cluster - Modified
handle_shutting_down_phase(): Calls remove_machine_from_cluster after grace period - Added error handling and status condition updates for CAPI operations
- Modified
Completes the CAPI integration after schema changes in previous commit. The operator can now:
- Validate that bootstrap and infrastructure references exist before creating Machines
- Create real cluster.x-k8s.io/v1beta1 Machine resources in Kubernetes
- Provision files using userData bash scripts generated from ConfigMap/Secret content
- Delete Machines when scheduled window ends or resource is removed
- Maintain proper ownership and labeling for CAPI compliance
- No breaking changes to CRD schema
- Runtime behavior change: operator now creates/deletes actual CAPI Machine resources
- Requires CAPI (cluster-api) to be installed in the cluster
- Bootstrap and infrastructure providers must be available
Author: Erick Bourgeois
-
src/crd.rs: BREAKING - Replacedcluster_deployment_refwithcluster_name(String)- Removes vendor-specific
ClusterDeploymentReftype (Mirantis/k0smotron/k0rdent specific) - Makes CRD agnostic to CAPI cluster management approach
cluster_nameis now required by bootstrap and infrastructure refs
- Removes vendor-specific
-
src/constants.rs: RemovedKIND_CLUSTER_DEPLOYMENTconstant (no longer needed) -
examples/*.yaml: Updated all examples to useclusterNameinstead ofclusterDeploymentRef -
docs/reference/api.md: Updated documentation to reflect new field
The ClusterDeploymentRef was specific to Mirantis/k0smotron/k0rdent and not a standard CAPI concept.
Using a simple clusterName string makes the CRD vendor-agnostic and aligns with standard CAPI practices
where the cluster name is used by bootstrap and infrastructure providers.
- Breaking change - Requires updating all existing ScheduledMachine manifests
- Config change only
Author: Erick Bourgeois
-
src/crd.rs: BREAKING - Converted from k0smotron to CAPI (Cluster API) based architecture- Changed API group from
5spot.eribourg.devto5spot.finos.org - Added
bootstrap_refandinfrastructure_reffor CAPI Machine creation - Made
filesfield non-optional inMachineSpec - Renamed
FileContentFromtypes toContentSourceandKeySelectorfor consistency - Added
ObjectReferencetype for generic Kubernetes object references - Changed status structure to use string phases instead of enum
- Added
next_activationandnext_cleanupfields to status - Changed
machine_refto useObjectReferenceinstead of customMachineReftype
- Changed API group from
-
src/constants.rs: Updated all constants for CAPI architecture- Changed API group constants to
5spot.finos.org - Updated finalizer name to use new API group
- Added CAPI Machine phase constants (Pending, Active, ShuttingDown, Inactive, Disabled, Terminated, Error)
- Added CAPI API version constants (
cluster.x-k8s.io/v1beta1) - Added new condition types:
ReferencesValid - Added new condition reasons:
ReferencesInvalid,FileResolutionFailed,ScheduleDisabled - Renamed
K0SMOTRON_*constants toCAPI_* - Added ConfigMap and Secret kind constants
- Changed API group constants to
-
src/reconcilers/helpers.rs: Added file content resolution functionality- New
resolve_file_contents()function to fetch content from ConfigMap/Secret references - New
ResolvedFilestruct to represent files with resolved content - Validates file paths (must be absolute) and permissions format (4-digit octal)
- Handles base64 decoding for Secret data
- New
-
src/reconcilers/scheduled_machine.rs: Updated error types for CAPI- Renamed
K0smotronErrortoCapiError - Added
FileResolutionErrorfor file content resolution failures - Added
ReferenceValidationErrorfor bootstrap/infrastructure ref validation
- Renamed
Complete architectural shift from k0smotron-specific machine management to standard Cluster API (CAPI) machine scheduling. This enables:
- Standard CAPI Machine lifecycle management
- Integration with any CAPI infrastructure provider
- File provisioning via ConfigMap/Secret content resolution
- ClusterDeployment modification for machine references
- Priority-based machine scheduling
- Time-based scheduling with graceful shutdown
- Breaking change - Incompatible with existing CRDs and resources
- Requires cluster rollout - New CRD must be deployed
- Config change only - No backward compatibility
- Documentation only
WARNING: This is a breaking change that requires full migration:
- Backup all existing ScheduledMachine resources
- Delete old CRD:
kubectl delete crd scheduledmachines.5spot.eribourg.dev - Deploy new CRD:
kubectl apply -f deploy/crds/scheduledmachine.yaml - Update all ScheduledMachine manifests to new schema:
- Change
apiVersionfrom5spot.eribourg.dev/v1alpha1to5spot.finos.org/v1alpha1 - Add
bootstrapRefandinfrastructureReffields - Update
filesstructure (now required, not optional) - Ensure all file paths are absolute and start with
/
- Change
- Recreate all resources with new manifests
✅ COMPILATION COMPLETE - All code compiles and tests pass. CAPI integration pending:
- ✅ CRD schema updated and compiles
- ✅ Constants updated for CAPI
- ✅ File content resolution implemented (ConfigMap/Secret)
- ✅ Error types updated
- ✅ Reconciler rewrite complete (phase-based state machine)
- ✅ Main.rs updated and compiles
- ✅ All Rust code compiles without warnings (
cargo clippypasses with strict flags) - ✅ All unit tests pass (38 tests)
- ✅ Tests updated with new schema (bootstrap_ref, infrastructure_ref)
- ✅ CRD YAML regenerated:
deploy/crds/scheduledmachine.yaml - ✅ API documentation regenerated:
docs/reference/api.md - ✅ Examples updated with new schema and validated
- ⏳ Reference validation logic (PLACEHOLDER - needs CAPI implementation)
- ⏳ CAPI Machine creation logic (PLACEHOLDER - needs CAPI API calls)
- ⏳ ClusterDeployment modification logic (PLACEHOLDER - needs implementation)
- ⏳ Cleanup and shutdown logic (PLACEHOLDER - needs CAPI deletion)
TODO: and #[allow(dead_code)] comments. Machine creation and deletion will not function until CAPI API calls are implemented.
- All clippy warnings fixed (doc comments, must_use attributes, wildcard imports, casting)
- Early return/guard clause pattern applied throughout
- Magic numbers eliminated (all numeric literals defined as constants)
- Unit tests updated and passing for all modules
- Documentation updated with proper formatting
- Implement reference validation (bootstrapRef, infrastructureRef, clusterDeploymentRef exist in cluster)
- Implement CAPI Machine creation with resolved file contents in userData
- Implement ClusterDeployment patching to add machine references
- Implement cleanup logic with graceful shutdown and CAPI Machine deletion
- Add timeout and retry logic for Kubernetes API calls
- Create migration guide documentation
- Update quickstart guide with CAPI-specific instructions
- Update examples in
/examples/directory - Regenerate API documentation:
cargo run --bin crddoc > docs/reference/api.md - Run full test suite:
cargo test - Validate all changes with
cargo fmt,cargo clippy, andcargo audit
Author: Erick Bourgeois
.github/copilot-instructions.md: Added comprehensive "Early Return / Guard Clause Pattern" section to Rust Style Guidelines
To establish and document the preferred coding style for handling control flow in the 5spot codebase. The early return pattern reduces nesting, improves readability, and makes code easier to test and maintain.
- Breaking change
- Requires cluster rollout
- Config change only
- Documentation only
The new section includes:
- Key principles of early return/guard clause pattern
- Benefits (reduced nesting, clearer code flow, easier testing)
- Comprehensive Rust code examples (good vs. bad)
- Guidance on when to use and when NOT to use early returns
- Integration with Rust's Result and ? operator patterns