-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17269: Do not publish synthetic solr core (of Coordinator node) to ZK #2438
SOLR-17269: Do not publish synthetic solr core (of Coordinator node) to ZK #2438
Conversation
* Experimental QA core as proxy in memory * Experimental QA core as proxy in memory * Experimental QA core as proxy in memory * ./gradlew tidy * Comment out portion that checks for synthetic collection/core existence * Fixes watches and test cases * Refactoring to avoid modifying existing method modifiers * Cleanup * Some refactoring to minimize changes to SolrCore * javadoc * ./gradlew tidy * code cleanup * code cleanup * Fix resource loader issue if configSetName is null * ./gradlew tidy * Fixed issue with incorrect collection name in context Ensure only one synthetic core created per config set * ./gradlew tidy * Correctly use <config set>_core as synthetic core name for clarity Should open the registered core on first synthetic core creation since it is a getCore op * ./gradlew tidy * Fixed issue with incorrect collection name in context
./gradlew tidy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see a reduction in code in some places
* @param configSetName an optional config set name | ||
* @return a SolrResourceLoader | ||
*/ | ||
protected SolrResourceLoader createCoreResourceLoader(CoreDescriptor cd, String configSetName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the configSetName is also on the CoreDescriptor, isn't this API needless? Same elsewhere (loadConfigSet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the choice I took was problematic! 😅
The ultimate goal is to minimize code change and yet avoid calling this part of the code for the synthetic core, so we wouldn't be registering against ZK.
Since I saw the quoted code above "reads the config name from cluster state and set it against the CoreDescriptor", so I thought maybe a new "explicit" configset can indicate that we can "skip" those logic for ZkConfigSetService
.
But it didn't turn out well if I confused other devs 😞 ...
Adding any flags/override method in ConfigSetService
is likely not great since other non ZK implementation shouldn't really care. So im proposing to add a new class SythethticCoreDescriptor
which extends CoreDescriptor
, and that ZkConfigSetService#createCoreResourceLoader
could do a special case (instanceof SythethticCoreDescriptor
, not great...but at least it's all contained in Zk related class) to bypass the zk logic.
Going to commit and push the change, please do lemme know if there are better alternatives!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ultimate goal is to minimize code change and yet avoid calling this part of the code for the synthetic core, so we wouldn't be registering against ZK.
Why? It appears it'd have no ultimate effect -- it ought to resolve the same config for the same collection. Oh, is it the same collection for this feature?
I don't love subclassing CoreDescriptor to be honest; you could just add a generic property in the core descriptor if you needed to flag one like "synthetic". Nonetheless overall I'm unsure why loading the ConfigSet needs to be different here.
Also if for some reason we felt it useful, we could change ZkConfigSetServer.loadConfigSet such that an existing configSet name in the CD will be used, and thus avoid resolving that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley thank you for the review again!
Why? It appears it'd have no ultimate effect -- it ought to resolve the same config for the same collection. Oh, is it the same collection for this feature?
We want to avoid registering the synthetic collection with ZK, since it creates various issues for our various tooling which watches the ZK for collection changes. We have to do various checks (and it's very easy to forget for new code) to skip the synthetic collection, which is not ideal.
I don't love subclassing CoreDescriptor to be honest; you could just add a generic property in the core descriptor if you needed to flag one like "synthetic". Nonetheless overall I'm unsure why loading the ConfigSet needs to be different here.
Also if for some reason we felt it useful, we could change ZkConfigSetServer.loadConfigSet such that an existing configSet name in the CD will be used, and thus avoid resolving that.
Got it! If it's more preferable to add property to CoreDescriptor
instead of subclassing, we could totally do that! I thought we want to minimize changes to existing class hence didn't touch CoreDescriptor
😓
Currently ZkConfigSetService#createCoreResourceLoader
does 3 things as far as I can tell:
- Auto create collection in ZK if it's not already here
- Set the configset read from ZK (off the collection from cluster state) to CD. Ignore the pre-existing on in CD
- Create a
ZkSolrResourceLoader
with access to the resources under the configset in ZK
Honestly, I couldn't quite tell the purposes of 1 and 2 (probably not the use cases I'm aware of, I am sure you are more knowledgeable!) . For synthetic core, we want to avoid both since the for 1., we do not want to register the synthetic collection to ZK and for 2., since the collection is not in ZK it would fail loading such collection from cluster state.
We do want 3 tho as the synthetic core is there to serve resources from the ZK configset.
Please let me know if it all makes sense? If so, I will proceed to add property to CoreDescriptor
😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove "1. Auto create collection in ZK if it's not already here"; this refers to legacy behavior. Or gate this with "bootstrap_conf" system property being specified or something like that. Heck, I wonder if this chunk of code actually works.... like if you did a little experiment locally, could you get the collection to appear merely by having a core descriptor for a non-existent collection? I wonder if it's even tested (easy to check by trial and error).
Hopefully there will be no need to designate this CoreDescriptor as special.
There's legacy stuff you are seeing in Solr. We don't need to support it forever and ever! Certainly not if it's getting in our way and causing us to contemplate adding otherwise needless complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove "1. Auto create collection in ZK if it's not already here"; this refers to legacy behavior. Or gate this with "bootstrap_conf" system property being specified or something like that. Heck, I wonder if this chunk of code actually works.... like if you did a little experiment locally, could you get the collection to appear merely by having a core descriptor for a non-existent collection? I wonder if it's even tested (easy to check by trial and error).
That's a good point! I tried locally with this:
- Copy an existing single shard/replica directory from data folder and rename it with a different collection name, patched the
core.properties
as well - Start and debug the code flow
- The
CoreDescriptor
did get loaded properly after scanning the disk - But on
createFromDescriptor
->preRegister
->checkStateInZk
->zkStateReader.waitForState
, it failed cause such collection is not in ZK - It does not really reach the
coreConfigService.loadConfigSet
and it's afterpreRegister
So i'm pretty sure we can remove that auto creation collection 😊
As for the 2. Set the configset read from ZK (off the collection from cluster state) to CD. Ignore the pre-existing on in CD, Based on the commit history, it is part of the effort to "restore legacy Collection auto-creation", which can be removed altogether since it does not work anymore anyway? 😅
Thank you for all the pointers again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i tried removing
String configSetName =
zkController.getClusterState().getCollection(cd.getCollectionName()).getConfigName();
cd.setConfigSet(configSetName);
Then the core loading starts failing. With some investigation:
CoreDescriptor
instantiation does not set theconfigSet
in thecoreProperties
via both admin create path and startup core discovery path. Those paths would setcollection.configName
instead.- Therefore without setting it as a part of
ZkConfigSetService#createCoreResourceLoader
,cd.getConfigSet
would give null, and could trigger error when callingnew ZkSolrResourceLoader( cd.getInstanceDir(), cd.getConfigSet(), parentLoader.getClassLoader(), zkController)
(or anywhere else relies onCoreDescriptor#getgetConfigSet
I'm not 100% sure if calling cd.setConfigSet(configSetName)
is a good thing to do within createCoreResourceLoader
since it appears to be an unexpected side effect of mutating the input CoreDescriptor
. However, to focus on synthetic core alone in this PR. I pushed a commit to do a null check with a side note
// Currently, cd.getConfigSet() is always null. Except that it's explicitly set by
// {@link org.apache.solr.core.SyntheticSolrCore}.
// Should we consider setting it for all cores as a part of CoreDescriptor creation/loading
// process?
if (cd.getConfigSet() == null) {
String configSetName =
zkController.getClusterState().getCollection(cd.getCollectionName()).getConfigName();
cd.setConfigSet(configSetName);
}
The SyntheticCoreDescriptor
class is removed!
Any thoughts? @dsmiley 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could understand that it's questionable for the ConfigSetService to be responsible for resolving what the proper named config even ought to be. But it works and there's not an obvious other location, and it empowers this ConfigSetService abstraction to do interesting things if needed, so I suggest let it be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing that manual check and being able to remove old legacy code :-)
solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java
Outdated
Show resolved
Hide resolved
I noticed you tagged nobody for review; it's good to think of one two pertinent people. |
collection.forEachReplica((s, replica) -> expectedNodes.remove(replica.getNodeName())); | ||
assertTrue(expectedNodes.isEmpty()); | ||
// this should be empty as synthetic collection does not register with ZK | ||
assertNull(collection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should invoke the core API command to confirm a synthetic core is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
Do not modify ConfigSetService, instead introduce SyntheticCoreDescriptor which ZkConfigSetService provides special handling Improved test case in TestCoordinatorRole#testSimple
@dsmiley @noblepaul Many thanks for the helpful code review! I have pushed some changes based on the suggestions, would you mind to review again? 🙏🏼 |
# Conflicts: # solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java
…eate collectiom state part, which no longer works anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good now; I like your change.
Can a SyntheticSolrCore be prevented from registering itself with CoreContainer? I would imagine so since this functionality is inserted at the HttpSolrCall level thus the SolrCore could come from wherever. It might be less hacky to prevent weird/unusual SolrCores from presenting themselves in the CoreContainer to other functionality (who knows) that is expecting normal cores. I don't have a strong opinion on this; just sharing.
throw new ZooKeeperException( | ||
SolrException.ErrorCode.SERVER_ERROR, "Failure auto-creating collection", e); | ||
} | ||
|
||
// The configSet is read from ZK and populated. Ignore CD's pre-existing configSet; only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore CD's pre-existing configSet
Wrong, now.
solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java
Outdated
Show resolved
Hide resolved
I think that's a very good question! AFAIK, we probably really only need I will investigate a bit and see if there are any dependencies that prevent us from doing that 😊 Update: Did a bit of refactoring, and it's not as clean as I hope it will be - since we still have to rely on |
@dsmiley I have committed changes based on your feedback last week! And have replied to several of your comments. Would you mind to go over them again and let me know your thoughts? Many many thanks! 🙏🏼 |
Looks great. Might you suggest a succinct CHANGES.txt entry for 9.7? CHANGES.txt is written for users to read/understand. Also can you propose a commit message here, which often adds detail. Like removal of that legacy logic. |
Many thanks for all the reviews! I will add an entry to CHANGES.txt under 9.7!
Do you mean the commit message for the CHANGE.txt change? Or is it something else? :) |
Yes to both. It's helpful if you provide this info as it affords peer review and it makes it easier for me :-) |
…ator node to zookeeper, this simplifies the code flow and avoid confusion to external tools that mistakenly recognize the synthetic collection as an actual collection. This is achieved by the introduction of SyntheticSolrCore which is created by CoordinatorHttpSolrCall, such SyntheticSolrCore provides a shortcut creation route that bypasses ZK registration.
…c-solr-core' into patsonluk/SOLR-17269/synthetic-solr-core # Conflicts: # solr/CHANGES.txt
CHANGES.txt updated Commit message:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to merge this soon... perhaps this weekend. Albeit I'm on vacation so we'll see.
…to ZK (#2438) Removed logic that registers the synthetic collection/core on a Coordinator node to Zookeeper. This simplifies the code flow and avoids confusion to external tools that mistakenly recognize the synthetic collection as an actual collection. This is achieved by the introduction of SyntheticSolrCore which is created by CoordinatorHttpSolrCall. SyntheticSolrCore provides a shortcut creation route that bypasses ZK registration. (cherry picked from commit 844cde9)
https://issues.apache.org/jira/browse/SOLR-17269
Description
The synthetic collection/replica introduced in Solr 9 registers such collection to zookeeper. This might cause issues with other tools which monitor solr collections via zk and they might be mistaken the synthetic collection as a real collection.
Solution
We are introducing a new class
SyntheticSolrCore
which extendsSolrCore
,SyntheticSolrCore
is only used in Coordinator node to support a subset ofSolrCore
functionalities required by Coordinator operations such as aggregating and writing out response and providing configset info. There will be one instance ofSyntheticSolrCore
per config set.Such synthetic core is only registered to the
CoreContainer
but is no longer registering itself to zookeeper.Take note that to minimize code changes, there will still be snapshot/index data directory written to the Coordinator node. We could have overridden
initSnapshotMetaDataManager
andinitIndex
to be no-op to avoid creating them, however, there is currently other code (and possibly future code too) that might rely on the existence of those fields. It is safer to just keep them as is.The data folder for the synthetic core would NOT be loaded on Coordinator node start-up as there's no
core.properties
in such directory. This is because instead of going throughCoreContainer#create
for such synthetic core, it's usingSyntheticSolrCore#createAndRegisterCore
, which bypasscore.properties
creation and zk registrationTests
The existing
TestCoordinatorRole
has already provided good coverage of various scenarios. Take note that we have removed the test code that verifies the existence of cores from the synthetic collection since zk no longer keeps track of synthetic collections. Instead we added a check inTestCoordinatorRole#testSimple
to ensure that the synthetic collection no longer exists in cluster state.Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.