forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sync] Sync master branch. #457
Open
wumuzi520
wants to merge
45
commits into
main
Choose a base branch
from
merge_master
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…9695) I don't know how the [PR](ray-project#49595) passed tests without this... Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: dentiny <[email protected]>
…nalKVPut (ray-project#49583) Signed-off-by: dayshah <[email protected]>
…oject#49588) Signed-off-by: dayshah <[email protected]>
## Why are these changes needed? Create request metadata for a new request through new function `get_request_metadata`. --------- Signed-off-by: Cindy Zhang <[email protected]>
…49612) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Refactor the code so it can be overwritten in the ImageURIPlugin ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
…on (ray-project#49591) Signed-off-by: Rui Qiao <[email protected]>
…project#49697) ## Why are these changes needed? State was being leaked across calls to `serve.run` due to in-place mutation within `get_deploy_args`. I've moved the logic into `build_app` and added associated unit tests. Also added an integration test matching the original bug report. ## Related issue number Closes ray-project#49074 --------- Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: dentiny <[email protected]>
…9573) Current operator fusion rule doesn't consider concurrency. This PR fixes this issue by only allow fusing 2 operators when they have the same concurrency. For task->actor, we allow fusion when task's concurrency = actor's upper bound. Also fixed some type hinting issues regarding compute strategy. --------- Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: dentiny <[email protected]>
…et() (ray-project#49461) Signed-off-by: Rui Qiao <[email protected]>
Before we prepare cgroup V2 environment at ray-project#48828 and setup cgroup for process at ray-project#48788, we should first check whether cgroupv2 is properly mounted. --------- Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
…s if no resource limits provided (ray-project#49613) Signed-off-by: Rueian <[email protected]>
Today we create `DashboardHeadModule`s with a `DashboardHead`. In preparation to support a multi process Dashboard, we need to use something serializable to create Heads. Introducing a DashboardHeadModuleConfig class to keep all strings and ints a Head needs to start up. Lazily creates these non-serializable objects in the DashboardHeadModule base class: - `gcs_client`, `gcs_aio_client` - `aiogrpc_gcs_channel` - `http_session` One exception is made for `metrics`, which is not serializable. Now we pass the metrics from DashboardHead to the Config and it's used only in MetricsHead. We can: 1. keep MetricsHead a non-Actor Head forever, to emit metrics in the dashboard metrics export port, OR 2. redefine `DashboardMetricsAddress` to be multiple values, and let each Head to export their own ports, OR 3. redefine Dashboard metrics as [Ray Application Level Metrics](https://docs.ray.io/en/latest/ray-observability/user-guides/add-app-metrics.html#application-level-metrics). This may need some tweaks in ray.util.metrics to make sure the Component is not `core_worker` but `dashboard.` --------- Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]>
ray-project#48838) I spent some time reading through the code, this PR passes "physical mode" ray cluster parameter from command line to raylet, which is required to check cgroup availability and setup preparation. This PR only adds parameter to existing code, but not enabled; so it's a no-op to production. --------- Signed-off-by: dentiny <[email protected]> Signed-off-by: hjiang <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
…ay-project#49130) Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: dentiny <[email protected]>
…roject#49552) ## Why are these changes needed? Return `DeploymentUnavailableError` if deployment replicas failed to start multiple times and transitioned to `DEPLOY_FAILED`. To accomplish this, we add a flag (indicating whether deployment is available or not) that's broadcasted along with running replica infos to routers. ``` @DataClass(frozen=True) class DeploymentTargetInfo: is_available: bool running_replicas: List[RunningReplicaInfo] ``` ## Related issue number Closes ray-project#48654 --------- Signed-off-by: Cindy Zhang <[email protected]>
Small isolated PR to (hopefully) fix flakiness issues with `python/ray/serve/tests/test_deploy.py::test_reconfigure_with_queries`, noticed while working on ray-project#48807 and other PRs. --------- Signed-off-by: Josh Karpel <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
so that the dependency are naturally cross platform, and conforms with python standards. --------- Signed-off-by: Ben Mares <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>
…oject#49729) otherwise the wheel does not compile. Signed-off-by: dayshah <[email protected]>
…49723) Actor name is useful information to show in visualization. Before: ![compiled_graph0](https://github.com/user-attachments/assets/0c7c6bfd-7c4c-45ee-a7bf-b6073c3a1bad) After: ![compiled_graph](https://github.com/user-attachments/assets/56e3c4e5-ab1d-4dc1-86b0-fcba0073793d) Signed-off-by: Rui Qiao <[email protected]>
…9596) Signed-off-by: dentiny <[email protected]>
…Clusters on Kubernetes (ray-project#49498) Signed-off-by: win5923 <[email protected]>
Signed-off-by: dentiny <[email protected]>
… some code cleanups. (ray-project#49714)
I mistakenly approved ray-project#49513 before making copy edits. This PR is a mulligan. ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]> Signed-off-by: angelinalg <[email protected]> Co-authored-by: Sven Mika <[email protected]>
…49719) Probably not perfect, but a small patch fix... Without this: ``` > serve run main:app FOO='abc=def' Error: Invalid key-value string 'FOO=abc=def'. Must be of the form 'key=value' ``` --------- Signed-off-by: Edward Oakes <[email protected]>
…roject#49503) Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
A rewrite of ray-project#48788 --------- Signed-off-by: dentiny <[email protected]>
The read_parquet implementation might infer the schema from the first file. This PR adds a test to ensure that implementation handles the case where the first file has no data and the inferred type is null. --------- Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: dentiny <[email protected]>
## Why are these changes needed? Tunes the Python garbage collector to reduce its frequency running in the proxy by default. A feature flag is introduced to disable this behavior and/or tune the threshold. ## Benchmarks ```python from ray import serve @serve.deployment( max_ongoing_requests=100, num_replicas=16, ray_actor_options={"num_cpus": 0}, ) class A: async def __call__(self): return b"hi" app = A.bind() ``` ``` ab -n 10000 -c 100 http://127.0.0.1:8000/ ``` No optimization: ``` Concurrency Level: 100 Time taken for tests: 11.985 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 834.34 [#/sec] (mean) Time per request: 119.855 [ms] (mean) Time per request: 1.199 [ms] (mean, across all concurrent requests) Transfer rate: 155.62 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.6 0 28 Processing: 5 119 30.3 121 227 Waiting: 3 118 30.2 120 225 Total: 5 120 30.2 121 227 Percentage of the requests served within a certain time (ms) 50% 121 66% 128 75% 135 80% 141 90% 158 95% 173 98% 189 99% 196 100% 227 (longest request) ``` GC freeze only: ``` Concurrency Level: 100 Time taken for tests: 11.838 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 844.72 [#/sec] (mean) Time per request: 118.383 [ms] (mean) Time per request: 1.184 [ms] (mean, across all concurrent requests) Transfer rate: 157.56 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.7 0 28 Processing: 5 117 31.5 119 302 Waiting: 3 116 31.5 118 300 Total: 5 118 31.5 119 303 Percentage of the requests served within a certain time (ms) 50% 119 66% 127 75% 134 80% 138 90% 151 95% 165 98% 184 99% 230 100% 303 (longest request) ``` GC threshold set to `10_000` (*default after this PR*): ``` Concurrency Level: 100 Time taken for tests: 11.223 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 891.00 [#/sec] (mean) Time per request: 112.233 [ms] (mean) Time per request: 1.122 [ms] (mean, across all concurrent requests) Transfer rate: 166.19 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 23 Processing: 5 111 26.9 116 202 Waiting: 2 110 27.0 115 199 Total: 5 112 26.8 116 202 Percentage of the requests served within a certain time (ms) 50% 116 66% 124 75% 128 80% 132 90% 146 95% 154 98% 164 99% 169 100% 202 (longest request) ``` GC threshold set to `100_000`: ``` Concurrency Level: 100 Time taken for tests: 11.481 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 870.98 [#/sec] (mean) Time per request: 114.813 [ms] (mean) Time per request: 1.148 [ms] (mean, across all concurrent requests) Transfer rate: 162.46 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 3 Processing: 5 114 25.0 112 256 Waiting: 2 113 25.0 111 254 Total: 5 114 24.9 112 256 Percentage of the requests served within a certain time (ms) 50% 112 66% 116 75% 119 80% 123 90% 150 95% 159 98% 185 99% 201 100% 256 (longest request) ``` --------- Signed-off-by: Edward Oakes <[email protected]>
The iter_batches release test doesn't actually execute the dataset. --------- Signed-off-by: Balaji Veeramani <[email protected]>
I just realized I made a mistake: - In the prev cleanup PR (ray-project#49739), I unified `cc_*` to `ray_cc_*` with `sed`, but I forgot to remove the copts; - Default copts have already been applied, no need to re-apply again at BUILD rule, https://github.com/ray-project/ray/blob/33a61aaa9f50be4e1572e732dfe3e92640a056cf/bazel/ray.bzl#L169-L176 Signed-off-by: dentiny <[email protected]>
I forgot to check the result for writing PID into cgroup files. --------- Signed-off-by: dentiny <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.