Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17269: Do not publish synthetic solr core (of Coordinator node) to ZK #2438

Merged
merged 15 commits into from
Jun 17, 2024

Conversation

patsonluk
Copy link
Contributor

@patsonluk patsonluk commented May 2, 2024

https://issues.apache.org/jira/browse/SOLR-17269

Description

The synthetic collection/replica introduced in Solr 9 registers such collection to zookeeper. This might cause issues with other tools which monitor solr collections via zk and they might be mistaken the synthetic collection as a real collection.

Solution

We are introducing a new class SyntheticSolrCore which extends SolrCore, SyntheticSolrCore is only used in Coordinator node to support a subset of SolrCore functionalities required by Coordinator operations such as aggregating and writing out response and providing configset info. There will be one instance of SyntheticSolrCore per config set.

Such synthetic core is only registered to the CoreContainer but is no longer registering itself to zookeeper.

Take note that to minimize code changes, there will still be snapshot/index data directory written to the Coordinator node. We could have overridden initSnapshotMetaDataManager and initIndex to be no-op to avoid creating them, however, there is currently other code (and possibly future code too) that might rely on the existence of those fields. It is safer to just keep them as is.

The data folder for the synthetic core would NOT be loaded on Coordinator node start-up as there's no core.properties in such directory. This is because instead of going through CoreContainer#create for such synthetic core, it's using SyntheticSolrCore#createAndRegisterCore, which bypass core.properties creation and zk registration

Tests

The existing TestCoordinatorRole has already provided good coverage of various scenarios. Take note that we have removed the test code that verifies the existence of cores from the synthetic collection since zk no longer keeps track of synthetic collections. Instead we added a check in TestCoordinatorRole#testSimple to ensure that the synthetic collection no longer exists in cluster state.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

patsonluk added 2 commits May 2, 2024 16:05
* Experimental QA core as proxy in memory

* Experimental QA core as proxy in memory

* Experimental QA core as proxy in memory

* ./gradlew tidy

* Comment out portion that checks for synthetic collection/core existence

* Fixes watches and test cases

* Refactoring to avoid modifying existing method modifiers

* Cleanup

* Some refactoring to minimize changes to SolrCore

* javadoc

* ./gradlew tidy

* code cleanup

* code cleanup

* Fix resource loader issue if configSetName is null

* ./gradlew tidy

* Fixed issue with incorrect collection name in context
Ensure only one synthetic core created per config set

* ./gradlew tidy

* Correctly use <config set>_core as synthetic core name for clarity
Should open the registered core on first synthetic core creation since it is a getCore op

* ./gradlew tidy

* Fixed issue with incorrect collection name in context
@patsonluk patsonluk marked this pull request as ready for review May 3, 2024 18:22
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see a reduction in code in some places

* @param configSetName an optional config set name
* @return a SolrResourceLoader
*/
protected SolrResourceLoader createCoreResourceLoader(CoreDescriptor cd, String configSetName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the configSetName is also on the CoreDescriptor, isn't this API needless? Same elsewhere (loadConfigSet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the choice I took was problematic! 😅

The ultimate goal is to minimize code change and yet avoid calling this part of the code for the synthetic core, so we wouldn't be registering against ZK.

Since I saw the quoted code above "reads the config name from cluster state and set it against the CoreDescriptor", so I thought maybe a new "explicit" configset can indicate that we can "skip" those logic for ZkConfigSetService.

But it didn't turn out well if I confused other devs 😞 ...

Adding any flags/override method in ConfigSetService is likely not great since other non ZK implementation shouldn't really care. So im proposing to add a new class SythethticCoreDescriptor which extends CoreDescriptor, and that ZkConfigSetService#createCoreResourceLoader could do a special case (instanceof SythethticCoreDescriptor, not great...but at least it's all contained in Zk related class) to bypass the zk logic.

Going to commit and push the change, please do lemme know if there are better alternatives!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate goal is to minimize code change and yet avoid calling this part of the code for the synthetic core, so we wouldn't be registering against ZK.

Why? It appears it'd have no ultimate effect -- it ought to resolve the same config for the same collection. Oh, is it the same collection for this feature?

I don't love subclassing CoreDescriptor to be honest; you could just add a generic property in the core descriptor if you needed to flag one like "synthetic". Nonetheless overall I'm unsure why loading the ConfigSet needs to be different here.

Also if for some reason we felt it useful, we could change ZkConfigSetServer.loadConfigSet such that an existing configSet name in the CD will be used, and thus avoid resolving that.

Copy link
Contributor Author

@patsonluk patsonluk Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsmiley thank you for the review again!

Why? It appears it'd have no ultimate effect -- it ought to resolve the same config for the same collection. Oh, is it the same collection for this feature?

We want to avoid registering the synthetic collection with ZK, since it creates various issues for our various tooling which watches the ZK for collection changes. We have to do various checks (and it's very easy to forget for new code) to skip the synthetic collection, which is not ideal.

I don't love subclassing CoreDescriptor to be honest; you could just add a generic property in the core descriptor if you needed to flag one like "synthetic". Nonetheless overall I'm unsure why loading the ConfigSet needs to be different here.

Also if for some reason we felt it useful, we could change ZkConfigSetServer.loadConfigSet such that an existing configSet name in the CD will be used, and thus avoid resolving that.

Got it! If it's more preferable to add property to CoreDescriptor instead of subclassing, we could totally do that! I thought we want to minimize changes to existing class hence didn't touch CoreDescriptor 😓

Currently ZkConfigSetService#createCoreResourceLoader does 3 things as far as I can tell:

  1. Auto create collection in ZK if it's not already here
  2. Set the configset read from ZK (off the collection from cluster state) to CD. Ignore the pre-existing on in CD
  3. Create a ZkSolrResourceLoader with access to the resources under the configset in ZK

Honestly, I couldn't quite tell the purposes of 1 and 2 (probably not the use cases I'm aware of, I am sure you are more knowledgeable!) . For synthetic core, we want to avoid both since the for 1., we do not want to register the synthetic collection to ZK and for 2., since the collection is not in ZK it would fail loading such collection from cluster state.

We do want 3 tho as the synthetic core is there to serve resources from the ZK configset.

Please let me know if it all makes sense? If so, I will proceed to add property to CoreDescriptor 😊

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove "1. Auto create collection in ZK if it's not already here"; this refers to legacy behavior. Or gate this with "bootstrap_conf" system property being specified or something like that. Heck, I wonder if this chunk of code actually works.... like if you did a little experiment locally, could you get the collection to appear merely by having a core descriptor for a non-existent collection? I wonder if it's even tested (easy to check by trial and error).

Hopefully there will be no need to designate this CoreDescriptor as special.

There's legacy stuff you are seeing in Solr. We don't need to support it forever and ever! Certainly not if it's getting in our way and causing us to contemplate adding otherwise needless complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove "1. Auto create collection in ZK if it's not already here"; this refers to legacy behavior. Or gate this with "bootstrap_conf" system property being specified or something like that. Heck, I wonder if this chunk of code actually works.... like if you did a little experiment locally, could you get the collection to appear merely by having a core descriptor for a non-existent collection? I wonder if it's even tested (easy to check by trial and error).

That's a good point! I tried locally with this:

  1. Copy an existing single shard/replica directory from data folder and rename it with a different collection name, patched the core.properties as well
  2. Start and debug the code flow
  3. The CoreDescriptor did get loaded properly after scanning the disk
  4. But on createFromDescriptor -> preRegister -> checkStateInZk -> zkStateReader.waitForState, it failed cause such collection is not in ZK
  5. It does not really reach the coreConfigService.loadConfigSet and it's after preRegister

So i'm pretty sure we can remove that auto creation collection 😊

As for the 2. Set the configset read from ZK (off the collection from cluster state) to CD. Ignore the pre-existing on in CD, Based on the commit history, it is part of the effort to "restore legacy Collection auto-creation", which can be removed altogether since it does not work anymore anyway? 😅

Thank you for all the pointers again!

Copy link
Contributor Author

@patsonluk patsonluk Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i tried removing

String configSetName =
          zkController.getClusterState().getCollection(cd.getCollectionName()).getConfigName();
      cd.setConfigSet(configSetName);

Then the core loading starts failing. With some investigation:

  1. CoreDescriptor instantiation does not set the configSet in the coreProperties via both admin create path and startup core discovery path. Those paths would set collection.configName instead.
  2. Therefore without setting it as a part of ZkConfigSetService#createCoreResourceLoader, cd.getConfigSet would give null, and could trigger error when calling new ZkSolrResourceLoader( cd.getInstanceDir(), cd.getConfigSet(), parentLoader.getClassLoader(), zkController) (or anywhere else relies on CoreDescriptor#getgetConfigSet

I'm not 100% sure if calling cd.setConfigSet(configSetName) is a good thing to do within createCoreResourceLoader since it appears to be an unexpected side effect of mutating the input CoreDescriptor. However, to focus on synthetic core alone in this PR. I pushed a commit to do a null check with a side note

    // Currently, cd.getConfigSet() is always null. Except that it's explicitly set by
    // {@link org.apache.solr.core.SyntheticSolrCore}.
    // Should we consider setting it for all cores as a part of CoreDescriptor creation/loading
    // process?
    if (cd.getConfigSet() == null) {
      String configSetName =
          zkController.getClusterState().getCollection(cd.getCollectionName()).getConfigName();
      cd.setConfigSet(configSetName);
    }

The SyntheticCoreDescriptor class is removed!

Any thoughts? @dsmiley 😊

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could understand that it's questionable for the ConfigSetService to be responsible for resolving what the proper named config even ought to be. But it works and there's not an obvious other location, and it empowers this ConfigSetService abstraction to do interesting things if needed, so I suggest let it be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for doing that manual check and being able to remove old legacy code :-)

@dsmiley
Copy link
Contributor

dsmiley commented Jun 1, 2024

I noticed you tagged nobody for review; it's good to think of one two pertinent people.

@noblepaul noblepaul self-requested a review June 2, 2024 22:43
collection.forEachReplica((s, replica) -> expectedNodes.remove(replica.getNodeName()));
assertTrue(expectedNodes.isEmpty());
// this should be empty as synthetic collection does not register with ZK
assertNull(collection);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should invoke the core API command to confirm a synthetic core is created

Copy link
Contributor Author

@patsonluk patsonluk Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

patsonluk added 2 commits June 3, 2024 16:44
Do not modify ConfigSetService, instead introduce SyntheticCoreDescriptor which ZkConfigSetService provides special handling

Improved test case in TestCoordinatorRole#testSimple
@patsonluk
Copy link
Contributor Author

@dsmiley @noblepaul Many thanks for the helpful code review!

I have pushed some changes based on the suggestions, would you mind to review again? 🙏🏼

patsonluk added 4 commits June 5, 2024 11:09
# Conflicts:
#	solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java
…eate collectiom state part, which no longer works anyway
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now; I like your change.

Can a SyntheticSolrCore be prevented from registering itself with CoreContainer? I would imagine so since this functionality is inserted at the HttpSolrCall level thus the SolrCore could come from wherever. It might be less hacky to prevent weird/unusual SolrCores from presenting themselves in the CoreContainer to other functionality (who knows) that is expecting normal cores. I don't have a strong opinion on this; just sharing.

throw new ZooKeeperException(
SolrException.ErrorCode.SERVER_ERROR, "Failure auto-creating collection", e);
}

// The configSet is read from ZK and populated. Ignore CD's pre-existing configSet; only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore CD's pre-existing configSet

Wrong, now.

@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 10, 2024

Can a SyntheticSolrCore be prevented from registering itself with CoreContainer? I would imagine so since this functionality is inserted at the HttpSolrCall level thus the SolrCore could come from wherever. It might be less hacky to prevent weird/unusual SolrCores from presenting themselves in the CoreContainer to other functionality (who knows) that is expecting normal cores. I don't have a strong opinion on this; just sharing.

I think that's a very good question! AFAIK, we probably really only need SyntheticSolrCore be present when CoordinatorHttpSolrCall is doing its special logic, which means in theory, it does not have to live within theCoreContainer/SolrCores. And it will be much cleaner w/o that!

I will investigate a bit and see if there are any dependencies that prevent us from doing that 😊

Update: Did a bit of refactoring, and it's not as clean as I hope it will be - since we still have to rely on CoreContainer for loading the core creation (loading configset etc). So the change be - instead of keeping track of the synthetic core in the map in SolrCores, do it in some Coordinator specific classes (for example the CoordinatorHttpSolrCall$Factory), this is all good but it lacks lifecycle management (basically closing the cores) like SolrCores/CoreContainer. Therefore, I will keep it as is for now in this PR 😓

@patsonluk
Copy link
Contributor Author

@dsmiley I have committed changes based on your feedback last week! And have replied to several of your comments.

Would you mind to go over them again and let me know your thoughts? Many many thanks! 🙏🏼

@dsmiley
Copy link
Contributor

dsmiley commented Jun 13, 2024

Looks great. Might you suggest a succinct CHANGES.txt entry for 9.7? CHANGES.txt is written for users to read/understand.

Also can you propose a commit message here, which often adds detail. Like removal of that legacy logic.

@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 13, 2024

Looks great. Might you suggest a succinct CHANGES.txt entry for 9.7? CHANGES.txt is written for users to read/understand.

Many thanks for all the reviews! I will add an entry to CHANGES.txt under 9.7!

Also can you propose a commit message here, which often adds detail. Like removal of that legacy logic.

Do you mean the commit message for the CHANGE.txt change? Or is it something else? :)

@dsmiley
Copy link
Contributor

dsmiley commented Jun 13, 2024

Yes to both. It's helpful if you provide this info as it affords peer review and it makes it easier for me :-)

…ator node to zookeeper, this simplifies the code flow and avoid confusion to external tools that mistakenly recognize the synthetic collection as an actual collection.

This is achieved by the introduction of SyntheticSolrCore which is created by CoordinatorHttpSolrCall, such SyntheticSolrCore provides a shortcut creation route that bypasses ZK registration.
…c-solr-core' into patsonluk/SOLR-17269/synthetic-solr-core

# Conflicts:
#	solr/CHANGES.txt
@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 14, 2024

@dsmiley

CHANGES.txt updated

Commit message:

Removed logic that registers the synthetic collection/core of Coordinator node to zookeeper, this simplifies the code flow and avoid confusion to external tools that mistakenly recognize the synthetic collection as an actual collection.

This is achieved by the introduction of SyntheticSolrCore which is created by CoordinatorHttpSolrCall, such SyntheticSolrCore provides a shortcut creation route that bypasses ZK registration.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to merge this soon... perhaps this weekend. Albeit I'm on vacation so we'll see.

solr/CHANGES.txt Show resolved Hide resolved
@dsmiley dsmiley merged commit 844cde9 into apache:main Jun 17, 2024
3 checks passed
dsmiley pushed a commit that referenced this pull request Jun 25, 2024
…to ZK (#2438)

Removed logic that registers the synthetic collection/core on a Coordinator node to Zookeeper.
This simplifies the code flow and avoids confusion to external tools that mistakenly recognize the synthetic collection as an actual collection.

This is achieved by the introduction of SyntheticSolrCore which is created by CoordinatorHttpSolrCall.
SyntheticSolrCore provides a shortcut creation route that bypasses ZK registration.

(cherry picked from commit 844cde9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants