[Draft] CSPL-3354: Add Lifecycle Hooks and Configurable Termination Grace Period to Splunk Operator #1424
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This Pull Request introduces enhancements to the Splunk Operator by integrating Lifecycle Hooks and allowing customers to configure the Termination Grace Period via the Custom Resource (
Common Spec). These changes aim to ensure graceful shutdowns of Splunk pods, thereby maintaining data integrity and improving the reliability of Splunk deployments on Kubernetes.Problem Statement
Customers running Splunk on Kubernetes have reported issues related to abrupt pod terminations, especially during node recycling or maintenance operations. Without proper shutdown procedures, Splunk instances may not decommission gracefully, leading to potential data loss and increased operational churn. Additionally, the lack of configurable grace periods limits customers' ability to tailor shutdown behaviors to their specific environments and requirements.
Proposed Solution
Integrate Lifecycle Hooks:
preStopHook: Executessplunk offlineandsplunk stopcommands before the pod is terminated. This ensures that Splunk instances decommission gracefully, preventing data corruption and loss.Configurable Termination Grace Period:
Common Specof the Splunk Operator’s Custom Resource to allow customers to specifyterminationGracePeriodSeconds.Changes Made
Custom Resource Definition:
terminationGracePeriodSecondsunder thecommonSpecsection to allow customization.StatefulSet Template Update:
lifecyclesection with thepreStophook.terminationGracePeriodSecondsvalue from theCommon Spec.Benefits
Related Issues
Testing Performed
Unit Tests:
terminationGracePeriodSecondsfrom the Custom Resource is correctly applied to the StatefulSet.preStoplifecycle hook executes the appropriate Splunk commands.Integration Tests:
splunk offlineandsplunk stopcommands were executed before termination.terminationGracePeriodSecondsvalues to ensure flexibility and correctness.Manual Testing:
Documentation Updates
Operator README:
terminationGracePeriodSecondsfield in the Custom Resource.Configuration Guides:
terminationGracePeriodSecondsbased on different deployment scenarios.How to Test
Update Custom Resource:
terminationGracePeriodSecondsin your Splunk Operator Custom Resource.Deploy or Update Splunk Cluster:
Verify StatefulSet Configuration:
preStoplifecycle hook and the correctterminationGracePeriodSeconds.Simulate Pod Termination:
preStophook.Future Considerations
splunk decommissionif it provides more comprehensive shutdown procedures compared tosplunk offlineandsplunk stop.terminationGracePeriodSecondswithout requiring full cluster redeployments.Reviewer Notes
terminationGracePeriodSecondsfield continue to operate with the default grace period.Pull Request Checklist: