Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting LocalAddr in peer communication - with e2e tests #17661

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

flawedmatrix
Copy link
Contributor

This PR builds upon #17085 in addressing issue #17068.

The e2e test added includes a positive test that --set-member-localaddr is needed for an etcd node to come up, and a test that without it, etcd will fail to connect to peers due to the certificate being issued for a different IP.

I did have to add in some cert generation to create certificates for the host IP where the test is running.

There is an implementation detail here that --set-member-localaddr will set the LocalAddr only if there is a specified non-loopback IP address in the --initial-advertise-peer-urls list. But I don't know if this is too restrictive.

@k8s-ci-robot
Copy link

Hi @flawedmatrix. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

server/embed/config.go Outdated Show resolved Hide resolved
@flawedmatrix flawedmatrix force-pushed the fix/17068 branch 2 times, most recently from 1ddb114 to 8c46359 Compare March 27, 2024 23:58
@ahrtr
Copy link
Member

ahrtr commented Mar 28, 2024

/ok-to-test

server/embed/config.go Outdated Show resolved Hide resolved
server/embed/config.go Outdated Show resolved Hide resolved
server/embed/etcd.go Outdated Show resolved Hide resolved
server/etcdmain/help.go Outdated Show resolved Hide resolved
server/etcdmain/help.go Outdated Show resolved Hide resolved
tests/e2e/utils.go Outdated Show resolved Hide resolved
tests/e2e/etcd_config_test.go Outdated Show resolved Hide resolved
tests/e2e/etcd_config_test.go Outdated Show resolved Hide resolved
server/embed/config_test.go Outdated Show resolved Hide resolved
@flawedmatrix
Copy link
Contributor Author

Thank you @ahrtr for the very thorough review. I've made the suggested changes to my PR.

server/embed/config_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Apr 3, 2024

cc @fuweid @jmhbnz @serathius PTAL

@flawedmatrix
Copy link
Contributor Author

/retest

@ahrtr
Copy link
Member

ahrtr commented Apr 11, 2024

@vivekpatani @ivanvc @fuweid @jmhbnz @serathius Please take a look at this PR. It resolve a real production issue #17068

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @flawedmatrix and @ahrtr. Some questions below.

server/embed/config_test.go Outdated Show resolved Hide resolved
@@ -107,6 +107,8 @@ Member:
Clustering:
--initial-advertise-peer-urls 'http://localhost:2380'
List of this member's peer URLs to advertise to the rest of the cluster.
--set-member-localaddr 'false'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note so we don't forget to do follow-up pr for website for this new clustering flag in https://etcd.io/docs/v3.6/op-guide/configuration.

@@ -107,6 +107,8 @@ Member:
Clustering:
--initial-advertise-peer-urls 'http://localhost:2380'
List of this member's peer URLs to advertise to the rest of the cluster.
--set-member-localaddr 'false'
Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to say local address rather than host? Basing this on the test case above where local IPv4 is prioritised over hostname.

Suggested change
Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.
Enable to have etcd use the first specified and non-loopback local address from initial-advertise-peer-urls as the local address when communicating with a peer.

Alternatively we might just want to state what the priority order is so users aren't left to figure this out on their own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's just me, but I feel like mentioning etcd in this case is redundant (without addressing the priority order) something like:

Enable using the first specified and non-loopback local address from initial-advertise-peer-urls as the local address when communicating with a peer.

Sounds more straightforward.

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for invoking me. I left a couple of comments. I hope they're relevant ✌️

server/embed/config.go Outdated Show resolved Hide resolved
server/embed/config_test.go Outdated Show resolved Hide resolved
@@ -107,6 +107,8 @@ Member:
Clustering:
--initial-advertise-peer-urls 'http://localhost:2380'
List of this member's peer URLs to advertise to the rest of the cluster.
--set-member-localaddr 'false'
Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's just me, but I feel like mentioning etcd in this case is redundant (without addressing the priority order) something like:

Enable using the first specified and non-loopback local address from initial-advertise-peer-urls as the local address when communicating with a peer.

Sounds more straightforward.

server/embed/config_test.go Show resolved Hide resolved
@flawedmatrix flawedmatrix force-pushed the fix/17068 branch 2 times, most recently from 5a19043 to 4ea35aa Compare April 12, 2024 23:38
@flawedmatrix
Copy link
Contributor Author

Thanks all for the suggestions. I've made the changes, added more unit test cases and added some logging to InferLocalAddr.

@flawedmatrix
Copy link
Contributor Author

/retest

@flawedmatrix flawedmatrix force-pushed the fix/17068 branch 2 times, most recently from d5f26d0 to 39453ca Compare May 1, 2024 20:48
@flawedmatrix
Copy link
Contributor Author

Hi @ahrtr, I just rebased again to fix the merge conflict. Could you please take a look?

@ivanvc
Copy link
Member

ivanvc commented May 3, 2024

Hi @flawedmatrix, I suggest squashing the commits in preparation for merging this PR.

@flawedmatrix
Copy link
Contributor Author

Hi @ivanvc, I've squashed the commits that were my contributions. Thanks.

@@ -619,6 +626,8 @@ func (cfg *Config) AddFlags(fs *flag.FlagSet) {
"initial-advertise-peer-urls",
"List of this member's peer URLs to advertise to the rest of the cluster.",
)
fs.BoolVar(&cfg.SetMemberLocalAddr, "set-member-localaddr", false, "Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, please follow https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/features.md#adding-a-new-feature

Once when kubernetes/enhancements#4610 is done , we can migrate to feature gate

Suggested change
fs.BoolVar(&cfg.SetMemberLocalAddr, "set-member-localaddr", false, "Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.")
fs.BoolVar(&cfg.SetMemberLocalAddr, "experimental-set-member-localaddr", false, "Enable to have etcd use the first specified and non-loopback host from initial-advertise-peer-urls as the local address when communicating with a peer.")

highpon and others added 2 commits May 23, 2024 18:29
Which sets the LocalAddr to an IP address from --initial-advertise-peer-urls.

Also adds e2e test that requires this flag to succeed.

Signed-off-by: Edwin Xie <[email protected]>
Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If we are considering backporting this PR, I think it will be easier if you squash the commits into one. You can still co-author the commit with the original contributor (refer to https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors). Thanks, @flawedmatrix.

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for your work on this @flawedmatrix

Comment on lines +316 to +321
// node 0 (127.0.0.1) does not set localaddr, while nodes 1 and nodes 2 (both use host's IP) do.
// The other two nodes will reject connections from node 0 warning that node 0's certificate is valid only for
// 127.0.0.1, not the host IP, since node 0 will try to connect to the other peers with the host IP
// as the client address.
// Node 0 will not reject connections from the other nodes since they will
// use the host's IP to connect (due to --experimental-set-member-localaddr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// node 0 (127.0.0.1) does not set localaddr, while nodes 1 and nodes 2 (both use host's IP) do.
// The other two nodes will reject connections from node 0 warning that node 0's certificate is valid only for
// 127.0.0.1, not the host IP, since node 0 will try to connect to the other peers with the host IP
// as the client address.
// Node 0 will not reject connections from the other nodes since they will
// use the host's IP to connect (due to --experimental-set-member-localaddr)
// node 0 (127.0.0.1) does not set `--experimental-set-member-localaddr`,
// while nodes 1 and nodes 2 do.
//
// node 0's peer certificate is signed on 127.0.0.1, but it uses host IP
// (by default) to communicate with peers, so they don't match, accordingly
// other peers will reject connections from node 0.
//
// Both node 1 and node 2's peer certificates are signed on host IP, and
// they also communicate with peers using host IP (explicitly set using
// flag `--experimental-set-member-localaddr`), so there is no issue.
//
// Refer to https://github.com/etcd-io/etcd/issues/17068.

@ahrtr
Copy link
Member

ahrtr commented May 24, 2024

Overall looks good to me with a minor comment, to ensure the e2e test is easier to understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

8 participants