Ray-2.38.0
Ray Libraries
Ray Data
🎉 New Features:
💫 Enhancements:
- Add
partitioning
parameter toread_parquet
(#47553) - Add
SERVICE_UNAVAILABLE
to list of retried transient errors (#47673) - Re-phrase the streaming executor current usage string (#47515)
- Remove ray.kill in ActorPoolMapOperator (#47752)
- Simplify and consolidate progress bar outputs (#47692)
- Refactor
OpRuntimeMetrics
to support properties (#47800) - Refactor
plan_write_op
andDatasink
s (#47942) - Link
PhysicalOperator
to itsLogicalOperator
(#47986) - Allow specifying both
num_cpus
andnum_gpus
for map APIs (#47995) - Allow specifying insertion index when registering custom plan optimization
Rule
s (#48039) - Adding in better framework for substituting logging handlers (#48056)
🔨 Fixes:
- Fix bug where Ray Data incorrectly emits progress bar warning (#47680)
- Yield remaining results from async
map_batches
(#47696) - Fix event loop mismatch with async map (#47907)
- Make sure
num_gpus
provide to Ray Data is appropriately passed toray.remote
call (#47768) - Fix unequal partitions when grouping by multiple keys (#47924)
- Fix reading multiple parquet files with ragged ndarrays (#47961)
- Removing unneeded test case (#48031)
- Adding in better json checking in test logging (#48036)
- Fix bug with inserting custom optimization rule at index 0 (#48051)
- Fix logging output from
write_xxx
APIs (#48096)
📖 Documentation:
- Add docs section for Ray Data progress bars (#47804)
- Add reference to parquet predicate pushdown (#47881)
- Add tip about how to understand map_batches format (#47394)
Ray Train
🏗 Architecture refactoring:
- Remove deprecated mosaic and sklearn trainer code (#47901)
Ray Tune
🔨 Fixes:
- Fix WandbLoggerCallback to reuse actors upon restore (#47985)
Ray Serve
🔨 Fixes:
- Stop scheduling task early when requests have been canceled (#47847)
RLlib
🎉 New Features:
- Enable cloud checkpointing. (#47682)
💫 Enhancements:
- PPO on new API stack now shuffles batches properly before each epoch. (#47458)
- Other enhancements: #47705, #47501, #47731, #47451, #47830, #47970, #47157
🔨 Fixes:
- Fix spot node preemption problem (RLlib now run stably with EnvRunner workers on spot nodes) (#47940)
- Fix action masking example. (#47817)
- Various other fixes: #47973, #46721, #47914, #47880, #47304, #47686
🏗 Architecture refactoring:
- Switch on new API stack by default for SAC and DQN. (#47217)
- Remove Tf support on new API stack for PPO/IMPALA/APPO (only DreamerV3 on new API stack remains with tf now). (#47892)
- Discontinue support for "hybrid" API stack (using RLModule + Learner, but still on RolloutWorker and Policy) (#46085)
- RLModule (new API stack) refinements: #47884, #47885, #47889, #47908, #47915, #47965, #47775
📖 Documentation:
- Add new API stack migration guide. (#47779)
- New API stack example script: BC pre training, then PPO finetuning using same RLModule class. (#47838)
- New API stack: Autoregressive actions example. (#47829)
- Remove old API stack connector docs entirely. (#47778)
Ray Core and Ray Clusters
Ray Core
🎉 New Features:
- CompiledGraphs: support multi readers in multi node when DAG is created from an actor (#47601)
💫 Enhancements:
- Add a flag to raise exception for out of band serialization of
ObjectRef
(#47544) - Store each GCS table in its own Redis Hash (#46861)
- Decouple create worker vs pop worker request. (#47694)
- Add metrics for GCS jobs (#47793)
🔨 Fixes:
- Fix broken dashboard cluster page when there are dead nodes (#47701)
- Fix the
ray_tasks{State="PENDING_ARGS_FETCH"}
metric counting (#47770) - Separate the attempt_number with the task_status in memory summary and object list (#47818)
- Fix object reconstruction hang on arguments pending creation (#47645)
- Fix check failure:
sync_reactors_.find(reactor->GetRemoteNodeID()) == sync_reactors_.end()
(#47861) - Fix check failure
RAY_CHECK(it != current_tasks_.end())
; (#47659)
📖 Documentation:
- KubeRay docs: Add docs for YuniKorn Gang scheduling #47850
Dashboard
💫 Enhancements:
- Performance improvements for large scale clusters (#47617)
🔨 Fixes:
- Placement group and required resources not showing correctly in dashboard (#47754)
Thanks
Many thanks to all those who contributed to this release!
@GeneDer, @rkooo567, @dayshah, @saihaj, @nikitavemuri, @bill-oconnor-anyscale, @WeichenXu123, @can-anyscale, @jjyao, @edoakes, @kekulai-fredchang, @bveeramani, @alexeykudinkin, @raulchen, @khluu, @sven1977, @ruisearch42, @dentiny, @MengjinYan, @Mark2000, @simonsays1980, @rynewang, @PatricYan, @zcin, @sofianhnaide, @matthewdeng, @dlwh, @scottjlee, @MortalHappiness, @kevin85421, @win5923, @aslonnie, @prithvi081099, @richardsliu, @milesvant, @omatthew98, @Superskyyy, @pcmoritz