-
Notifications
You must be signed in to change notification settings - Fork 173
fix(arbitrator): resolve startup failure with missing config backup directory (#1309) #1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…irectory (#1309) The arbitrator service was failing to start with error "Monitoring config backup directory not defined" due to multiple issues: 1. Duplicate flag parsing: Go's flag package conflicted with Cobra, causing --user flag value to be lost when using --user=value syntax 2. Empty ConfDirBackup: User lookup failed with empty username, resulting in empty backup directory path 3. Inappropriate fatal error: Arbitrator never saves config, doesn't need backup directory Changes: - Remove duplicate flag.Parse() that conflicted with Cobra flag handling - Add WithArbitration compile-time flag to identify arbitrator binary - Skip ConfDirBackup requirement when WithArbitration="ON" - Fix systemd service files to use correct Cobra flag syntax (--user repman) The fix maintains all safety checks for server deployments while allowing the arbitrator to start without requiring a config backup directory.
Fixes identified during code review: CRITICAL: - Remove unused "flag" import that broke build HIGH: - Guard os.MkdirAll(ConfDirBackup) to skip when arbitrator mode - Add WithArbitration check to PushConfigToBackupDir() early return MEDIUM: - Add Restart=on-failure to arbitrator systemd service for consistency - Also removed trailing space from Description line These changes prevent the arbitrator from attempting filesystem operations on empty ConfDirBackup paths while maintaining full functionality for server builds.
…setting The --user flag was not registered in arbitrator's command flags after removing duplicate flag.Parse(). This caused "unknown flag: --user" error. Changes: - Add --user flag as PersistentFlag on rootCmd - Add server package import to access WithArbitration variable - Use local withArbitration constant and assign to server.WithArbitration This restores the --user flag functionality for the arbitrator command while maintaining the cleaner flag handling approach.
Code ReviewI've reviewed this PR and overall it's a solid fix for issue #1309. The changes correctly address the arbitrator startup failure. Here's my detailed feedback: ✅ Strengths
🔍 Code Quality Observationsserver/server.go (Lines 277-281)Potential Issue: The removal of the // BEFORE (removed code):
var usr string
flag.StringVar(&usr, "user", "", "help message")
flag.Parse()
if usr == "" && repman != nil {
usr = repman.OsUser.Username
}
// AFTER:
var usr string
if repman != nil && repman.OsUser != nil {
usr = repman.OsUser.Username
}Concern: The flags.StringVar(&conf.MonitoringSystemUser, "user", "", "OS User for running repman")Recommendation: Consider removing the unused func (repman *ReplicationManager) AddFlags(flags *pflag.FlagSet, conf *config.Config, isClient bool) {
flags.IntVar(&conf.TokenTimeout, "api-token-timeout", 48, "Timespan of API Token before expired in hour")
if WithDeprecate == "ON" {
// initDeprecated() // not needed used alias in main
}
flags.StringVar(&conf.MonitoringSystemUser, "user", "", "OS User for running repman")
// ... rest of function
}arbitrator/arbitrator_cmd.go (Lines 23, 58)Minor Observation: The local // Current:
var withArbitration = "ON"
// ...
server.WithArbitration = withArbitration
// Alternative (more direct):
server.WithArbitration = "ON"However, keeping the local variable maintains consistency with how other packages handle feature flags, so this is acceptable either way. 🛡️ Security ConsiderationsNo security concerns identified. The changes:
🐛 Potential Bugsserver/server.go (Line 1185-1187)Defensive Programming Issue: The guard condition checks both if WithArbitration != "ON" && conf.ConfDirBackup != "" {
os.MkdirAll(conf.ConfDirBackup, os.ModePerm)
}Scenario: If Recommendation: This is actually fine as-is for defense in depth. The fatal check above ensures ⚡ Performance ConsiderationsNo performance impact. The changes add simple string comparisons that occur only during initialization, not in hot paths. 🧪 Test CoverageObservation: The PR description mentions manual testing:
Recommendation: Consider adding a regression test in the // regtest/test_arbitrator_init.go
func testArbitratorNoBackupDir(cluster *cluster.Cluster, conf string, test *cluster.Test) bool {
// Verify arbitrator can start without ConfDirBackup set
// ...
}However, this may be out of scope for this bug fix. 📝 DocumentationMinor Suggestion: The PR body is excellent and thorough. Consider documenting the 🎯 Final VerdictApprove with minor suggestions. The fix is:
Minor Improvements (non-blocking):
Great work fixing a critical blocker for arbitrator deployments! 🎉 |
Summary
Fixes #1309 - Arbitrator service failing to start with error "Monitoring config backup directory not defined"
Problem Analysis
The arbitrator was failing during initialization due to multiple interconnected issues:
flagpackage was conflicting with Cobra, causing--userflag value to be lost when using--user=valuesyntaxChanges Made
Core Fixes
server/server.go:
flag.Parse()that conflicted with Cobra flag handlingflagimportWithArbitration != "ON"condition to skip ConfDirBackup requirement for arbitratoros.MkdirAll(conf.ConfDirBackup)to prevent operations on empty pathsPushConfigToBackupDir()when in arbitrator modeserver/server_cmd.go:
WithArbitrationvariable already existed (defaults to "OFF")arbitrator/arbitrator_cmd.go:
serverpackage importserver.WithArbitration = "ON"in init() to signal arbitrator mode--userflag as PersistentFlag on rootCmdService Files
service/replication-manager-arb.service:
--user=repman→--user repmanRestart=on-failureandRestartSec=5for consistencyservice/replication-manager-osc.service:
--user=repman→--user repmanservice/replication-manager-pro.service:
--user=repman→--user repmanTest Results
All tests pass ✅
--userflag recognized and functionalVerification
Both scenarios now work correctly. The arbitrator proceeds to cluster discovery and database connection without requiring
ConfDirBackupto be set.Backward Compatibility
No breaking changes:
WithArbitrationflag follows existing pattern used byWithTarball,WithProvisioning, etc.WithArbitration="OFF"Migration Notes
Users with custom systemd service files should update to space-separated flag syntax:
--user=repman--user repmanNo other migration needed.