-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2062] Ensure the Orchestrator
always has DagManager
remove FlowSpec
- even when orchestration has failed
#3944
[GOBBLIN-2062] Ensure the Orchestrator
always has DagManager
remove FlowSpec
- even when orchestration has failed
#3944
Conversation
@@ -45,6 +45,7 @@ dependencies { | |||
} | |||
|
|||
compile externalDependency.gson | |||
compile externalDependency.lombok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear how this was missed... but not being there resulted in MANY compilation warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, how did you find that this was missing
@@ -57,6 +58,8 @@ public static class NoLongerLeasingStatus extends LeaseAttemptStatus {} | |||
current LeaseObtainedStatus via the completeLease method from a caller without access to the {@link MultiActiveLeaseArbiter}. | |||
*/ | |||
@Data | |||
// avoid - warning: Generating equals/hashCode implementation but without a call to superclass, even though this class does not extend java.lang.Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another preventable source of compilation warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is to prevent using any future fields added to LeaseAttemptStatus
, then why not add callSuper=false
to NoLongerLeasingStatus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warnings at least don't indicate that it's necessary over there and I believe that's because warning arose since LeaseObtainedStatus
and LeasedToAnotherStatus
are @Data
, whereas NoLongerLeasingStatus
is not.
since not @Data
, lombok is not synthesizing .equals
for NoLongerLeasingStatus
} finally { | ||
// remove from the flow catalog, regardless of whether the flow was successfully validated and permitted to exec (concurrently) | ||
this.dagManager.removeFlowSpecIfAdhoc(flowSpec); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too bad the diff above doesn't clearly indicate it was solely an indentation change to add the try ... finally
here. the purpose of which is to ensure FlowCatalog cleanup, come whatever may
// WARNING: this `spiedDagManager` WILL NOT BE the one used by the `Orchestrator`: its DM has apparently already been | ||
// provided to the `Orchestrator` ctor, prior to this replacement here of `GobblinServiceManager.dagManager` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documenting for posterity, since unfortunately, this was quite troublesome to debug
private static final String TEST_USER = "testUser"; | ||
private static final String TEST_PASSWORD = "testPassword"; | ||
private static final String TEST_TABLE = "quotas"; | ||
|
||
@BeforeClass | ||
@BeforeMethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately lots of "testing smells" here.
first off, class-level test init doesn't play well w/ verifying mock interactions, since we want a fresh count for the exec of each test method. relatedly it also complicates - if not outright foreclosing on - parallel test method execution.
secondly, the @Test(dependsOnMethod = ...)
structuring is a testing anti-pattern that here obscured that setup
-style init had been formulated instead as test methods; see - createTopologySpec()
. a major clue of something wrong might have been that that particular "@test" method does not even exercise Orchestrator
, our class-under-test!
for now, I've fixed this suite to use per-method setup/teardown best practices, but left it as a TODO to figure out where createTopologySpec
actually belongs. createFlowSpec
and deleteFlowSpec
also deserve attention in this same regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's anything wrong with @BeforeClass
explicitly, it has its uses when doing expensive setup e.g. DB or Kafka setup and this is done in some other classes. I would argue that you can have both BeforeClass
and BeforeMethod
depending on your needs, though for this class specifically BeforeMethod
makes more sense for mock setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and now realize, after you pointed it out, that I didn't contextualize overall test init best practice, so much as denounce what's broken about using solely @BeforeClass
init.
to encapsulate a fuller picture now, I'd advise
Prefer
@{Before,After}Method
test init to@{Before,After}Class
. when mocks are in use, the latter w/o the former MOST LIKELY indicates the wrong choice, whereas having only method, but not class-level init is quite fine. In fact the most common acceptable rationale for class-level init is to amortize payment of initialization tax just once per class, rather than incurring once per test case.
truly, at this moment, I can't think of any non-performance-motivated argument for using @BeforeClass
over just adding everything into @BeforeMethod
. can you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its better to avoid state between tests when possible so beforeMethod works much better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting pt. of comparison: after the PR check test failures gave:
java.io.IOException: Failure creation table TestDagStateStore
due to:
java.sql.SQLNonTransientConnectionException: Too many connections
I realized the ITestMetastoreDatabase
needed to be closed.
When I first put .close()
within @AfterMethod
, the 7 OrchestratorTest
cases took 1m49s.
I then defined an @AfterClass
for .close()
and those same 7 tests (all passing with either definition) took only 33s. so definitely sticking w/ @BeforeClass
+ @AfterClass
for that one!
this.topologyCatalog.addListener(orchestrator); | ||
this.flowCatalog.addListener(orchestrator); | ||
FlowCompilationValidationHelper flowCompilationValidationHelper = new FlowCompilationValidationHelper(config, sharedFlowMetricsSingleton, mock(UserQuotaManager.class), mockFlowStatusGenerator); | ||
this.dagMgrNotFlowLaunchHandlerBasedOrchestrator = new Orchestrator(ConfigUtils.propertiesToConfig(orchestratorProperties), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to avoid confusion, as I now pass Optional.absent()
for the FlowLaunchHandler
. nothing about the previously-existing tests below strictly validated that code path, so I made this entire OrchestratorTest
class specific to the legacy DagManager
version of Orchestrator
.
when I say:
nothing about the previously-existing tests below...
I found only one test, doNotRegisterMetricsAdhocFlows()
, that actually belongs in this class. while it certainly is helpful, it's also quite tangential to the core operation of Orchestrator::orchestrate
.
the three new tests I added mostly cover the DagManager
-based orchestrate
. as for the lack of validation for the other code path, that suggests a missing test: e.g. to verify that orchestrate
invokes FlowLaunchHandler::handleFlowLaunchTriggerEvent
. at that time we should STILL ALSO verify the equivalent of doNotRegisterMetricsAdhocFlows()
cc: @umustafi and @arjun4084346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make a note to add the other code path for FlowLaunchHandler baed orchestrate
. If those reside in this test class, then following the methodology discussed about we should have a separate setupClass
method used for that test.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3944 +/- ##
=============================================
- Coverage 50.18% 38.72% -11.47%
+ Complexity 5908 1595 -4313
=============================================
Files 1075 388 -687
Lines 41226 15952 -25274
Branches 4625 1577 -3048
=============================================
- Hits 20691 6177 -14514
+ Misses 18732 9282 -9450
+ Partials 1803 493 -1310 ☔ View full report in Codecov by Sentry. |
b28350f
to
2c5cbb9
Compare
decf423
to
3b38d07
Compare
public synchronized void addDagAndRemoveAdhocFlowSpec(FlowSpec flowSpec, Dag<JobExecutionPlan> dag, boolean persist, boolean setStatus) | ||
throws IOException { | ||
addDag(dag, persist, setStatus); | ||
public synchronized void removeFlowSpecIfAdhoc(FlowSpec flowSpec) throws IOException { | ||
// Only the active dagManager should delete the flowSpec | ||
if (isActive) { | ||
deleteSpecFromCatalogIfAdhoc(flowSpec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the delete looks good here.
but should it be at https://github.com/apache/gobblin/blob/master/gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java#L829 also @phet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be in GSJS because all hosts in multi-active scheduler state needed access to the flowSpec in catalog see this PR desc: #3846. We were running into no spec found
in the DagActionStoreChangeMonitor
error if an inactive host deleted the flowSpec
before active host processed CDC stream message. For MA execution we want the spec in catalog until all hosts have received the CDC stream message and retrieved spec -> need to think about how to achieve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we are not removing it from flow catalog there, just removing it from scheduledFlowSpecs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch of this bug and thanks for fixing these tests! Looks like a pain. These changes work for the single active DM for now i think we should merge it in and then discuss if it makes sense for multi-active dagProcEngine
3b38d07
to
5593a86
Compare
…ve `FlowSpec` - even when orchestration has failed (apache#3944) * Ensure the `Orchestrator` always has `DagManager` remove `FlowSpec` - even when orchestration has failed * Address several compilation warnings related to lombok * Ensure `OrchestratorTest` cleans up its `ITestMetastoreDatabase` * Avoid NPE in `GobblinServiceManagerTest::cleanUp` * Remove out-dated and confusing comment in `GobblinServiceJobScheduler`
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
the Orchestrator + DagManager MUST remove adhoc flows from the FlowCatalog that fail compilation or violate concurrent execs. otherwise gaas will continue to return '409 Conflict' to each subsequent attempt to subsequently create an adhoc flow with the same flowGroup+flowName. this is despite the fact that the flow (still remaining in the FlowCatalog, when it shouldn't be) already has the status of FAILED, which is a "final status".
Tests
added test cases (herein)
Commits