Skip to content

Conversation

Yicheng-Lu-llll
Copy link
Contributor

@Yicheng-Lu-llll Yicheng-Lu-llll commented Sep 2, 2025

Why are these changes needed?

This PR updates all socket-related code to ensure compatibility with both IPv4 and IPv6. The socket logic detects whether the node IP is IPv4 or IPv6 and configures the socket accordingly.

The node IP address is obtained from user configuration when explicitly set (e.g., ray start --node-ip-address=192.168.1.100). If no IP is specified by the user, the system automatically detects the appropriate node IP address using node_ip_address_from_perspective().

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch 2 times, most recently from a54b8e2 to 5cb3e54 Compare September 3, 2025 19:36
@@ -31,7 +32,7 @@ NativeRayRuntime::NativeRayRuntime() {

auto bootstrap_address = ConfigInternal::Instance().bootstrap_ip;
if (bootstrap_address.empty()) {
bootstrap_address = GetNodeIpAddress();
bootstrap_address = ray::GetNodeIpAddressFromPerspective();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the function to GetNodeIpAddressFromPerspective to keep it consistent with the Python function node_ip_address_from_perspective. Additionally, I used CPython bindings to avoid maintaining duplicate implementations in both C++ and Python.

from ray._common.test_utils import wait_for_condition
from ray._private.conftest_utils import set_override_dashboard_url # noqa: F401
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_override_dashboard_url is no longer used.

}
}

std::unique_ptr<boost::asio::ip::tcp::socket> CreateTcpSocket(
Copy link
Contributor Author

@Yicheng-Lu-llll Yicheng-Lu-llll Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should prioritize the node IP address that the user explicitly provides
(e.g., ray start --node-ip-address=192.168.1.100). When no explicit IP is provided,
we should fall back to auto-detection using GetNodeIpAddressFromPerspective().

While this is straightforward to implement in Python, it's challenging for C++ utility
functions to access user-specified node IP addresses. Currently, there are only one mechanism
available for C++ code to get node ip from python
when Python starts a C++ process (like worker), it passes the node IP via command-line flags:

DEFINE_string(node_ip_address, "", "The IP address of the node.");
DECLARE_string(node_ip_address);

Beside that, if using ray through C++ API directly, Ray has global access to node up in this scenario, but still not for the core part to access.

ABSL_FLAG(std::string, ray_node_ip_address, "", "The ip address for this node.");

void Init(ray::RayConfig &config, int argc, char **argv) {
  if (!IsInitialized()) {
    internal::ConfigInternal::Instance().Init(config, argc, argv);
    auto runtime = internal::AbstractRayRuntime::DoInit();
    is_init_ = true;
  }
}

ConfigInternal::Instance().node_ip_address 

For a core but general utility functions that may be called from various contexts, it can not get node ip through flags, The most practical solution I think is :
1, to use environment variables: Python processes can set an environment variable (e.g., RAY_NODE_IP_ADDRESS), and C++ code can read it to determine if the user has specified a node IP address.
2, implement a mechanism to allow accessing node IPs globally, similar to the C++ API(ConfigInternal::Instance().node_ip_address )

Copy link
Contributor Author

@Yicheng-Lu-llll Yicheng-Lu-llll Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently choose method two.

/// \param address The IP address and port of any known live service on the
/// network you care about.
/// \return The IP address by which the local node can be reached from the address.
std::string GetNodeIpAddressFromPerspective(const std::string &address = "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value is 8.8.8.8:53 in master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but to support both IPv6 and IPv4, I decided to change the design:

If no address is set, Google Public DNS will be used by default. It will first try IPv4, then IPv6, and the result will be cached.

If an address is provided, it will use the IP version based on the address, without caching.

@@ -79,7 +79,7 @@ def get_node_ip(self):

def find_free_port(self):
"""Finds a free port on the current node."""
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
with closing(create_socket(socket.SOCK_STREAM)) as s:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I missed before, this is documentation example code so it shouldn't use Ray internal APIs, we should keep it as it is. Could you change back the build_address change as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@@ -138,7 +143,7 @@ def __init__(
)

self._resource_and_label_spec = None
self._localhost = socket.gethostbyname("localhost")
self._localhost = get_localhost_address()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old way should just work for both ipv4 and ipv6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

socket.gethostbyname(hostname)
Translate a host name to IPv4 address format. The IPv4 address is returned as a string, such as '100.50.200.5'. If the host name is an IPv4 address itself it is returned unchanged. See gethostbyname_ex() for a more complete interface. gethostbyname() does not support IPv6 name resolution, and getaddrinfo() should be used instead for IPv4/v6 dual stack support.

Raises an auditing event socket.gethostbyname with argument hostname.

Availability: not WASI.

I checked the documentation here, it seems that it does not support it: https://docs.python.org/3/library/socket.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use socket.getaddrinfo which works for both ipv4 and ipv6.

Copy link
Contributor Author

@Yicheng-Lu-llll Yicheng-Lu-llll Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new function get_localhost_address uses socket.getaddrinfo underneath.
It first checks the node IP to decide. If no node IP is found, it falls back to socket.getaddrinfo, preferring IPv4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have a consistent way to determine whether to use IPv6 or IPv4. The current approach is:
decide based on the user-specified node IP; if no node IP is provided, prefer IPv4, then IPv6.
The same logic should applied in all places, including localhost and socket-related code.

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch 2 times, most recently from fa61584 to 697488d Compare September 4, 2025 15:41


@lru_cache(maxsize=1)
def get_localhost_address() -> str:
Copy link
Contributor Author

@Yicheng-Lu-llll Yicheng-Lu-llll Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, get_localhost_address should also be moved to C++ and shared with Python. I wrote the Python version for now because this PR does not focus on localhost IPv6 support, and there are currently no usages of get_localhost_address in C++. It can definitely be moved to C++ later.

@Yicheng-Lu-llll Yicheng-Lu-llll marked this pull request as ready for review September 4, 2025 16:21
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 4, 2025
Comment on lines 76 to 81
if (
ray._private.worker._global_node is not None
and ray._private.worker._global_node.node_ip_address
):
node_ip = ray._private.worker._global_node.node_ip_address
return "::1" if is_ipv6_ip(node_ip) else "127.0.0.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to check node_ip for localhost: it's ok that public ip is ipv6 and localhost is 127.0.0.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I will remove the logic then. My original thought was that it might be better to have same ip family decision consistent in all places.

Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for core team to review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community core Issues that should be addressed in Ray Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants