03 Dec 19:08

fishbone

92599d9

Ray-1.9.0

Highlights

Ray Train is now in beta! If you are using Ray Train, we’d love to hear your feedback here!
Ray Docker images for multiple CUDA versions are now provided (#19505)! You can specify a -cuXXX suffix to pick a specific version.
- ray-ml:cpu images are now deprecated. The ray-ml images are only built for GPU.
Ray Datasets now supports groupby and aggregations! See the groupby API and GroupedDataset docs for usage.
We are making continuing progress in improving Ray stability and usability on Windows. We encourage you to try it out and report feedback or issues at https://github.com/ray-project/ray/issues.
We are launching a Ray Job Submission server + CLI & SDK clients to make it easier to submit and monitor Ray applications when you don’t want an active connection using Ray Client. This is currently in alpha, so the APIs are subject to change, but please test it out and file issues / leave feedback on GitHub & discuss.ray.io!

Ray Autoscaler

💫Enhancements:

Graceful termination of Ray nodes prior to autoscaler scale down (#20013)
Ray Clusters on AWS are colocated in one Availability Zone to reduce costs & latency (#19051)

Ray Client

🔨 Fixes:

ray.put on a list of of objects now returns a single object ref (#19737)

Ray Core

🎉 New Features:

Support remote file storage for runtime_env (#20280, #19315)
Added ray job submission client, cli and rest api (#19567, #19657, #19765, #19845, #19851, #19843, #19860, #19995, #20094, #20164, #20170, #20192, #20204)

💫Enhancements:

Garbage collection for runtime_env (#20009, #20072)
Improved logging and error messages for runtime_env (#19897, #19888, #18893)

🔨 Fixes:

Fix runtime_env hanging issues (#19823)
Fix specifying runtime env in @ray.remote decorator with Ray Client (#19626)
Threaded actor / core worker / named actor race condition fixes (#19751, #19598, #20178, #20126)

📖Documentation:

New page “Handling Dependencies”
New page “Ray Job Submission: Going from your laptop to production”

Ray Java

API Changes:

Fully supported namespace APIs. (Check out the namespace for more information.) #19468 #19986 #20057
Removed global named actor APIs and global placement group APIs. #20219 #20135
Added timeout parameter for Ray.Get() API. #20282

Note:

Use Ray.getActor(name, namespace) API to get a named actor between jobs instead of Ray.getGlobalActor(name).
Use PlacementGroup.getPlacementGroup(name, namespace) API to get a placement group between jobs instead of PlacementGroup.getGlobalPlacementGroup(name).

Ray Datasets

🎉 New Features:

Added groupby and aggregations (#19435, #19673, #20010, #20035, #20044, #20074)
Support custom write paths (#19347)

🔨 Fixes:

Support custom CSV write options (#19378)

🏗 Architecture refactoring:

Optimized block compaction (#19681)

Ray Workflow

🎉 New Features:

Workflow right now support events (#19239)
Allow user to specify metadata for workflow and steps (#19372)
Allow in-place run a step if the resources match (#19928)

🔨 Fixes:

Fix the s3 path issue (#20115)

RLlib

🏗 Architecture refactoring:

“framework=tf2” + “eager_tracing=True” is now (almost) as fast as “framework=tf”. A check for tf2.x eager re-traces has been added making sure re-tracing does not happen outside the initial function calls. All CI learning tests (CartPole, Pendulum, FrozenLake) are now also run as framework=tf2. (#19273, #19981, #20109)
Prepare deprecation of build_trainer/build_(tf_)?policy utility functions. Instead, use sub-classing of Trainer or Torch|TFPolicy. POCs done for PGTrainer, PPO[TF|Torch]Policy. (#20055, #20061)
V-trace (APPO & IMPALA): Don’t drop last ts can be optionally switch on. The default is still to drop it, but this may be changed in a future release. (#19601)
Upgrade to gym 0.21. (#19535)

🔨 Fixes:

Minor bugs/issues fixes and enhancements: #19069, #19276, #19306, #19408, #19544, #19623, #19627, #19652, #19693, #19805, #19807, #19809, #19881, #19934, #19945, #20095, #20128, #20134, #20144, #20217, #20283, #20366, #20387

📖Documentation:

RLlib main page (“RLlib in 60sec”) overhaul. (#20215, #20248, #20225, #19932, #19982)
Major docstring cleanups in preparation for complete overhaul of API reference pages. (#19784, #19783, #19808, #19759, #19829, #19758, #19830)
Other documentation enhancements. (#19908, #19672, #20390)

Tune

💫Enhancements:

Refactored and improved experiment analysis (#20197, #20181)
Refactored cloud checkpointing API/SyncConfig (#20155, #20418, #19632, #19641, #19638, #19880, #19589, #19553, #20045, #20283)
Remove magic results (e.g. config) before calculating trial result metrics (#19583)
Removal of tech debt (#19773, #19960, #19472, #17654)
Improve testing (#20016, #20031, #20263, #20210, #19730
Various enhancements (#19496, #20211)

🔨Fixes:

Documentation fixes (#20130, #19791)
Tutorial fixes (#20065, #19999)
Drop 0 value keys from PGF (#20279)
Fix shim error message for scheduler (#19642)
Avoid looping through _live_trials twice in _get_next_trial. (#19596)
clean up legacy branch in update_avail_resources. (#20071)
fix Train/Tune integration on Client (#20351)

Train

Ray Train is now in Beta! The beta version includes various usability improvements for distributed PyTorch training and checkpoint management, support for Ray Client, and an integration with Ray Datasets for distributed data ingest.

Check out the docs here, and the migration guide from Ray SGD to Ray Train here. If you are using Ray Train, we’d love to hear your feedback here!

🎉 New Features:

New train.torch.prepare_model(...) and train.torch.prepare_data_loader(...) API to automatically handle preparing your PyTorch model and DataLoader for distributed training (#20254).
Checkpoint management and support for custom checkpoint strategies (#19111).
Easily configure what and how many checkpoints to save to disk.
Support for Ray Client (#20123, #20351).

💫Enhancements:

Simplify workflow for training with a single worker (#19814).
Ray Placement Groups are used for scheduling the training workers (#20091).
PACK strategy is used by default but can be changed by setting the TRAIN_ENABLE_WORKER_SPREAD environment variable.
Automatically unwrap Torch DDP model and convert to CPU when saving a model as checkpoint (#20333).

🔨Fixes:

Fix HorovodBackend to automatically detect NICs- thanks @tgaddair! (#19533).

📖Documentation:

Denote public facing APIs with beta stability (#20378)
Doc updates (#20271)

Serve

We would love to hear from you! Fill out the Ray Serve survey here.

🎉 New Features:

New checkpoint_path configuration allows Serve to save its internal state to external storage (disk, S3, and GCS) and recover upon failure. (#19166, #19998, #20104)
Replica autoscaling is ready for testing out! (#19559, #19520)
Native Pipeline API for model composition is ready for testing as well!

🔨Fixes:

Serve deployment functions or classes can take no parameters (#19708)
Replica slow start message is improved. You can now see whether it is slow to allocate resources or slow to run constructor. (#19431)
pip install ray[serve] will now install ray[default] as well. (#19570)

🏗 Architecture refactoring:

The terminology of “backend” and “endpoint” are officially deprecated in favor of “deployment”. (#20229, #20085, #20040, #20020, #19997, #19947, #19923, #19798).
Progress towards Java API compatibility (#19463).

Dashboard

Ray Dashboard is now enabled on Windows! (#19575)

Thanks

Many thanks to all those who contributed to this release!
@krfricke, @stefanbschneider, @ericl, @nikitavemuri, @qicosmos, @worldveil, @triciasfu, @AmeerHajAli, @javi-redondo, @architkulkarni, @pdames, @clay4444, @mGalarnyk, @liuyang-my, @matthewdeng, @suquark, @rkooo567, @mwtian, @chenk008, @dependabot[bot], @iycheng, @jiaodong, @scv119, @oscarknagg, @Rohan138, @stephanie-wang, @Zyiqin-Miranda, @ijrsvt, @roireshef, @tkaymak, @simon-mo, @ashione, @jovany-wang, @zenoengine, @tgaddair, @11rohans, @amogkam, @zhisbug, @lchu-ibm, @shrekris-anyscale, @pcmoritz, @yiranwang52, @mattip, @sven1977, @Yard1, @DmitriGekhtman, @ckw017, @WangTaoTheTonic, @wuisawesome, @kcpevey, @kfstorm, @rhamnett, @renos, @TeoZosa, @SongGuyang, @clarkzinzow, @avnishn, @iasoon, @gjoliver, @jjyao, @xwjiang2010, @dmatrix, @edoakes, @czgdp1807, @heng2j, @sungho-joo, @lixin-wei

Contributors

ericl, pcmoritz, and 65 other contributors

Assets 2

02 Nov 18:33

xwjiang2010

ray-1.8.0

72fdf3b

Ray-1.8.0

Highlights

Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here.
Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!
This Ray release supports Apple Silicon (M1 Macs). Check out the installation instructions for more information!

Ray Autoscaler

🎉 New Features:

Fake multi-node mode for autoscaler testing (#18987)

💫Enhancements:

Improve unschedulable task warning messages by integrating with the autoscaler (#18724)

Ray Client

💫Enhancements

Use async rpc for remote call and actor creation (#18298)

Ray Core

💫Enhancements

Eagerly install job-level runtime_env (#19449, #17949)

🔨 Fixes:

Fixed resource demand reporting for infeasible 1-CPU tasks (#19000)
Fixed printing Python stack trace in Python worker (#19423)
Fixed macOS security popups (#18904)
Fixed thread safety issues for coreworker (#18902, #18910, #18913 #19343)
Fixed placement group performance and resource leaking issues (#19277, #19141, #19138, #19129, #18842, #18652)
Improve unschedulable task warning messages by integrating with the autoscaler (#18724)
Improved Windows support (#19014, #19062, #19171, #19362)
Fix runtime_env issues (#19491, #19377, #18988)

Ray Data

Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. It supports repeating and rewindowing pipelines, zipping two pipelines together, better cancellation of Datasets workloads, and many performance improvements. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!

🎉 New Features:

Ray Train integration (#17626)
Add support for repeating and rewindowing a DatasetPipeline (#19091)
.iter_epochs() API for iterating over epochs in a DatasetPipeline (#19217)
Add support for zipping two datasets together (#18833)
Transformation operations are now cancelled when one fails or the entire workload is killed (#18991)
Expose from_pandas()/to_pandas() APIs that accept/return plain Pandas DataFrames (#18992)
Customize compression, read/write buffer size, metadata, etc. in the IO layer (#19197)
Add spread resource prefix for manual round-robin resource-based task load balancing

💫Enhancements:

Minimal rows are now dropped when doing an equalized split (#18953)
Parallelized metadata fetches when reading Parquet datasets (#19211)

🔨 Fixes:

Tensor columns now properly support table slicing (#19534)
Prevent Datasets tasks from being captured by Ray Tune placement groups (#19208)
Empty datasets are properly handled in most transformations (#18983)

🏗 Architecture refactoring:

Tensor dataset representation changed to a table with a single tensor column (#18867)

RLlib

🎉 New Features:

Allow n-step > 1 and prioritized replay for R2D2 and RNNSAC agents. (18939)

🔨 Fixes:

Fix memory leaks in TF2 eager mode. (#19198)
Faster worker spaces inference if specified through configuration. (#18805)
Fix bug for complex obs spaces containing Box([2D shape]) and discrete components. (#18917)
Torch multi-GPU stats not protected against race conditions. (#18937)
Fix SAC agent with dict space. (#19101)
Fix A3C/IMPALA in multi-agent setting. (#19100)

🏗 Architecture refactoring:

Unify results dictionary returned from Trainer.train() across agents regardless of (tf or pytorch, multi-agent, multi-gpu, or algos that use >1 SGD iterations, e.g. ppo) (#18879)

Ray Workflow

🎉 New Features:

Introduce workflow.delete (#19178)

🔨Fixes:

Fix the bug which allow workflow step to be executed multiple times (#19090)

🏗 Architecture refactoring:

Object reference serialization is decoupled from workflow storage (#18328)

Tune

🎉 New Features:

PBT: Add burn-in period (#19321)

💫Enhancements:

Optional forcible trial cleanup, return default autofilled metrics even if Trainable doesn't report at least once (#19144)
Use queue to display JupyterNotebookReporter updates in Ray client (#19137)
Add resume="AUTO" and enhance resume error messages (#19181)
Provide information about resource deadlocks, early stopping in Tune docs (#18947)
Fix HEBOSearch installation docs (#18861)
OptunaSearch: check compatibility of search space with evaluated_rewards (#18625)
Add save and restore methods for searchers that were missing it & test (#18760)
Add documentation for reproducible runs (setting seeds) (#18849)
Depreciate max_concurrent in TuneBOHB (#18770)
Add on_trial_result to ConcurrencyLimiter (#18766)
Ensure arguments passed to tune remote_run match (#18733)
Only disable ipython in remote actors (#18789)

🔨Fixes:

Only try to sync driver if sync_to_driver is actually enabled (#19589)
sync_client: Fix delete template formatting (#19553)
Force no result buffering for hyperband schedulers (#19140)
Exclude trial checkpoints in experiment sync (#19185)
Fix how durable trainable is retained in global registry (#19223, #19184)
Ensure loc column in progress reporter is filled (#19182)
Deflake PBT Async test (#19135)
Fix Analysis.dataframe() documentation and enable passing of mode=None (#18850)

Ray Train (SGD)

Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here. Ray Train is integrated with Ray Datasets for distributed data loading while training, documentation available here.

🎉 New Features:

Ray Datasets Integration (#17626)

🔨Fixes:

Improved support for multi-GPU training (#18824, #18958)
Make actor creation async (#19325)

📖Documentation:

Rename Ray SGD v2 to Ray Train (#19436)
Added migration guide from Ray SGD v1 (#18887)

Serve

🎉 New Features:

Add ability to recover from a checkpoint on cluster failure (#19125)
Support kwargs to deployment constructors (#19023)

🔨Fixes:

Fix asyncio compatibility issue (#19298)
Catch spurious ConnectionErrors during shutdown (#19224)
Fix error with uris=None in runtime_env (#18874)
Fix shutdown logic with exit_forever (#18820)

🏗 Architecture refactoring:

Progress towards Serve autoscaling (#18793, #19038, #19145)
Progress towards Java support (#18630)
Simplifications for long polling (#19154, #19205)

Dashboard

🎉 New Features:

Basic support for the dashboard on Windows (#19319)

🔨Fixes:

Fix healthcheck issue causing the dashboard to crash under load (#19360)
Work around aiohttp 4.0.0+ issues (#19120)

🏗 Architecture refactoring:

Improve dashboard agent retry logic (#18973)

Thanks

Many thanks to all those who contributed to this release!
@rkooo567, @lchu-ibm, @scv119, @pdames, @suquark, @antoine-galataud, @sven1977, @mvindiola1, @krfricke, @ijrsvt, @sighingnow, @marload, @jmakov, @clay4444, @mwtian, @pcmoritz, @iycheng, @ckw017, @chenk008, @jovany-wang, @jjyao, @hauntsaninja, @franklsf95, @jiaodong, @wuisawesome, @odp, @matthewdeng, @duarteocarmo, @czgdp1807, @gjoliver, @mattip, @richardliaw, @max0x7ba, @Jasha10, @acxz, @xwjiang2010, @SongGuyang, @simon-mo, @zhisbug, @ccssmnn, @Yard1, @hazeone, @o0olele, @froody, @robertnishihara, @amogkam, @sasha-s, @xychu, @lixin-wei, @architkulkarni, @edoakes, @clarkzinzow, @DmitriGekhtman, @avnishn, @liuyang-my, @stephanie-wang, @Chong-Li, @ericl, @juliusfrost, @carlogrisetti

Contributors

ericl, pcmoritz, and 58 other contributors

Assets 2

23 Aug 20:22

krfricke

ray-1.6.0

7916500

Ray-1.6.0

Highlights

Runtime Environments are ready for general use! This feature enables you to dynamically specify per-task, per-actor and per-job dependencies, including a working directory, environment variables, pip packages and conda environments. Install it with pip install -U 'ray[default]'.
Ray Dataset is now in alpha! Dataset is an interchange format for distributed datasets, powered by Arrow. You can also use it for a basic Ray native data processing experience. Check it out here.
Ray Lightning v0.1 has been released! You can install it via pip install ray-lightning. Ray Lightning is a library of PyTorch Lightning plugins for distributed training using Ray. Features:
- Enables quick and easy parallel training
- Supports PyTorch DDP, Horovod, and Sharded DDP with Fairscale
- Integrates with Ray Tune for hyperparameter optimization and is compatible with Ray Client
pip install ray now has a significantly reduced set of dependencies. Features such as the dashboard, the cluster launcher, runtime environments, and observability metrics may require pip install -U 'ray[default]' to be enabled. Please report any issues on Github if this is an issue!

Ray Autoscaler

🎉 New Features:

The Ray autoscaler now supports TPUs on GCP. Please refer to this example for spinning up a simple TPU cluster. (#17278)

💫Enhancements:

Better AWS networking configurability (#17236 #17207 #14080)
Support for running autoscaler without NodeUpdaters (#17194, #17328)

🔨 Fixes:

Code clean up and corrections to downscaling policy (#17352)
Docker file sync fix (#17361)

Ray Client

💫Enhancements:

Updated docs for client server ports and ray.init(ray://) (#17003, #17333)
Better error handling for deserialization failures (#17035)

🔨 Fixes:

Fix for server proxy not working with non-default redis passwords (#16885)

Ray Core

🎉 New Features:

Runtime Environments are ready for general use!
- Specify a working directory to upload your local files to all nodes in your cluster.
- Specify different conda and pip dependencies for your tasks and actors and have them installed on the fly.

🔨 Fixes:

Fix plasma store bugs for better data processing stability (#16976, #17135, #17140, #17187, #17204, #17234, #17396, #17550)
Fix a placement group bug where CUDA_VISIBLE_DEVICES were not properly detected (#17318)
Improved Ray stacktrace messages. (#17389)
Improved GCS stability and scalability (#17456, #17373, #17334, #17238, #17072)

🏗 Architecture refactoring:

Plasma store refactor for better testability and extensibility. (#17332, #17313, #17307)

Ray Data Processing

Ray Dataset is now in alpha! Dataset is an interchange format for distributed datasets, powered by Arrow. You can also use it for a basic Ray native data processing experience. Check it out here.

RLLib

🎉 New Features:

Support for RNN/LSTM models with SAC (new agent: "RNNSAC"). Shoutout to ddworak94! (#16577)
Support for ONNX model export (tf and torch). (#16805)
Allow Policies to be added to/removed from a Trainer on-the-fly. (#17566)

🔨 Fixes:

Fix for view requirements captured during compute actions test pass. Shoutout to Chris Bamford (#15856)
Issues: 17397, 17425, 16715, 17174. When on driver, Torch|TFPolicy should not use ray.get_gpu_ids() (b/c no GPUs assigned by ray). (#17444)
Other bug fixes: #15709, #15911, #16083, #16716, #16744, #16896, #16999, #17010, #17014, #17118, #17160, #17315, #17321, #17335, #17341, #17356, #17460, #17543, #17567, #17587

🏗 Architecture refactoring:

CV2 to Skimage dependency change (CV2 still supported). Shoutout to Vince Jankovics. (#16841)
Unify tf and torch policies wrt. multi-GPU handling: PPO-torch is now 33% faster on Atari and 1 GPU. (#17371)
Implement all policy maps inside RolloutWorkers to be LRU-caches so that a large number of policies can be added on-the-fly w/o running out of memory. (#17031)
Move all tf static-graph code into DynamicTFPolicy, such that policies can be deleted and their tf-graph is GC'd. (#17169)
Simplify multi-agent configs: In most cases, creating dummy envs (only to retrieve spaces) are no longer necessary. (#16565, #17046)

📖Documentation:

Examples scripts do-over (shoutout to Stefan Schneider for this initiative).
Example script: League-based self-play with "open spiel" env. (#17077)
Other doc improvements: #15664 (shoutout to kk-55), #17030, #17530

Tune

🎉 New Features:

Dynamic trial resource allocation with ResourceChangingScheduler (#16787)
It is now possible to use a define-by-run function to generate a search space with OptunaSearcher (#17464)

💫Enhancements:

String names of searchers/schedulers can now be used directly in tune.run (#17517)
Filter placement group resources if not in use (progress reporting) (#16996)
Add unit tests for flatten_dict (#17241)

🔨Fixes:

Fix HDFS sync down template (#17291)
Re-enable TensorboardX without Torch installed (#17403)

📖Documentation:

LightGBM integration (#17304)
Other documentation improvements: #17407 (shoutout to amavilla), #17441, #17539, #17503

SGD

🎉 New Features:

We have started initial development on a new RaySGD v2! We will be rolling it out in a future version of Ray. See the documentation here. (#17536, #17623, #17357, #17330, #17532, #17440, #17447, #17300, #17253)

💫Enhancements:

Placement Group support for TorchTrainer (#17037)

Serve

🎉 New Features:

Add Ray API stability annotations to Serve, marking many serve.\* APIs as Stable (#17295)
Support runtime_env's working_dir for Ray Serve (#16480)

🔨Fixes:

Fix FastAPI's response_model not added to class based view routes (#17376)
Replace backend with deployment in metrics & logging (#17434)

🏗Stability Enhancements:

Run Ray Serve with multi & single deployment large scale (1K+ cores) test running nightly (#17310, #17411, #17368, #17026, #17277)

Thanks

Many thanks to all who contributed to this release:

@suquark, @xwjiang2010, @clarkzinzow, @kk-55, @mGalarnyk, @pdames, @Souphis, @edoakes, @sasha-s, @iycheng, @stephanie-wang, @antoine-galataud, @scv119, @ericl, @amogkam, @ckw017, @wuisawesome, @krfricke, @vakker, @qingyun-wu, @Yard1, @juliusfrost, @DmitriGekhtman, @clay4444, @mwtian, @corentinmarek, @matthewdeng, @simon-mo, @pcmoritz, @qicosmos, @architkulkarni, @rkooo567, @navneet066, @dependabot[bot], @jovany-wang, @kombuchafox, @thomasjpfan, @kimikuri, @Ivorforce, @franklsf95, @MissiontoMars, @lantian-xu, @duburcqa, @ddworak94, @ijrsvt, @sven1977, @kira-lin, @SongGuyang, @kfstorm, @Rohan138, @jamesmishra, @amavilla, @fyrestone, @lixin-wei, @stefanbschneider, @jiaodong, @richardliaw, @WangTaoTheTonic, @chenk008, @Catch-Bull, @Bam4d

Contributors

ericl, pcmoritz, and 48 other contributors

Assets 2

12 Aug 19:24

jiaodong

ray-1.5.2

88666d3

Ray-1.5.2

Cherrypick release to address RLlib issue, no library or core changes included.

Assets 2

31 Jul 01:47

jiaodong

ray-1.5.1

7d69ebb

Ray-1.5.1

Cherrypick release to address a few external integration and documentation issues, no library or core changes included.

Assets 2

26 Jul 18:43

jiaodong

ray-1.5.0

ddad6a2

Ray-1.5.0

Ray 1.5.0 Release Note

Highlight

Ray Datasets is now in alpha (https://docs.ray.io/en/master/data/dataset.html)
LightGBM on Ray is now in beta (https://github.com/ray-project/lightgbm_ray).
- enables multi-node and multi-GPU training
- integrates seamlessly with distributed hyperparameter optimization library Ray Tune
- comes with fault tolerance handling mechanisms, and
- supports distributed dataframes and distributed data loading

Ray Autoscaler

🎉 New Features:

Aliyun support (#15712)

💫 Enhancements:

[Kubernetes] Operator refactored to use Kopf package (#15787)
Flag to control config bootstrap for rsync (#16667)
Prometheus metrics for Autoscaler (#16066, #16198)
Allows launching in subnets where public IP assignments off by default (#16816)

🔨 Fixes:

[Kubernetes] Fix GPU=0 resource handling (#16887)
[Kubernetes] Release docs updated with K8s test instructions (#16662)
[Kubernetes] Documentation update (#16570)
[Kubernetes] All official images set to rayproject/ray:latest (#15988 #16205)
[Local] Fix bootstrapping ray at a given static set of ips (#16202, #16281)
[Azure] Fix Azure Autoscaling Failures (#16640)
Handle node type key change / deletion (#16691)
[GCP] Retry GCP BrokenPipeError (#16952)

Ray Client

🎉 New Features:

Client integrations with major Ray Libraries (#15932, #15996, #16103, #16034, #16029, #16111, #16301)
Client Connect now returns a context that hasdisconnect and can be used as a context manager (#16021)

💫 Enhancements:

Better support for multi-threaded client-side applications (#16731, #16732)
Improved error messages and warnings when misusing Ray Client (#16454, #16508, #16588, #16163)
Made Client Object & Actor refs a subclass of their non-client counterparts (#16110)

🔨 Fixes:

dir() Works for client-side Actor Handles (#16157)
Avoid server-side time-outs (#16554)
Various fixes to the client-server proxy (#16040, #16038, #16057, #16180)

Ray Core

🎉 New Features:

Ray dataset alpha is available!

🔨 Fixes:

Fix various Ray IO layer issues that fixes hanging & high memory usage (#16408, #16422, #16620, #16824, #16791, #16487, #16407, #16334, #16167, #16153, #16314, #15955, #15775)
Namespace now properly isolates placement groups (#16000)
More efficient object transfer for spilled objects (#16364, #16352)

🏗 Architecture refactoring:

From Ray 1.5.0, liveness of Ray jobs are guaranteed as long as there’s enough disk space in machines with the “fallback allocator” mechanism which allocates plasma objects to the disk directly when objects cannot be created in memory or spilled to the disk.

RLlib

🎉 New Features:

Support for adding/deleting Policies to a Trainer on-the-fly (#16359, #16569, #16927).
Added new “input API” for customizing offline datasets (shoutout to Julius F.). (#16957)
Allow for external env PolicyServer to listen on n different ports (given n rollout workers); No longer require creating an env on the server side to get env’s spaces. (#16583).

🔨 Fixes:

CQL: Bug fixes and clean-ups (fixed iteration count). (#16531, #16332)
D4RL: #16721
ensure curiosity exploration actions are passed in as tf tensors (shoutout to Manny V.). (#15704)
Other bug fixes and cleanups: #16162 and #16309 (shoutout to Chris B.), #15634, #16133, #16860, #16813, #16428, #16867, #16354, #16218, #16118, #16429, #16427, #16774, #16734, #16019, #16171, #16830, #16722

📖 Documentation and testing:

#16311, #15908, #16271, #16080, #16740, #16843

🏗 Architecture refactoring:

All RLlib algos operating on Box action spaces now operate on normalized actions by default (ranging from -1.0 to 1.0). This enables PG-style algos to learn in skewed action spaces. (#16531)

Tune

🎉 New Features:

New integration with LightGBM via Tune callbacks (#16713).
New cost-efficient HPO searchers (BlendSearch and CFO) available from the FLAML library (https://github.com/microsoft/FLAML). (#16329)

💫 Enhancements:

Pass in configurations that have already been evaluated separately to Searchers. This is useful for warm-starting or for meta-searchers, for example (#16485)
Sort trials in reporter table by metric (#16576)
Add option to keep random values constant over grid search (#16501)
Read trial results from json file (#15915)

🔨 Fixes:

Fix infinite loop when using Searcher that limits concurrency internally in conjunction with a ConcurrencyLimiter (#16416)
Allow custom sync configuration with DurableTrainable (#16739)
Logger fixes. W&B: #16806, #16674, #16839. MLflow: #16840
Various bug fixes: #16844, #16017, #16575, #16675, #16504, #15811, #15899, #16128, #16396, #16695, #16611

📖 Documentation and testing:

Use BayesOpt for quick start example (#16997)
#16793, #16029, #15932, #16980, #16450, #16709, #15913, #16754, #16619

SGD

🎉 New Features:

Torch native mixed precision is now supported! (#16382)

🔨 Fixes:

Use target label count for training batch size (#16400)

📖 Documentation and testing:

#15999, #16111, #16301, #16046

Serve

💫 Enhancements: UX improvements (#16227, #15909), Improved logging (#16468)
🔨 Fixes: Fix shutdown logic (#16524), Assorted bug fixes (#16647, #16760, #16783)
📖 Documentation and testing: #16042, #16631, #16759, #16786

Thanks

Many thanks to all who contributed to this release:

@Tonyhao96, @simon-mo, @scv119, @Yard1, @llan-ml, @xcharleslin, @jovany-wang, @ijrsvt, @max0x7ba, @annaluo676, @rajagurunath, @zuston, @amogkam, @yorickvanzweeden, @mxz96102, @chenk008, @Bam4d, @mGalarnyk, @kfstorm, @crdnb, @suquark, @ericl, @marload, @jiaodong, @Thexiang, @ellimac54, @qicosmos, @mwtian, @jkterry1, @sven1977, @howardlau1999, @mvindiola1, @stefanbschneider, @juliusfrost, @krfricke, @matthewdeng, @zhuangzhuang131419, @brandonJY, @Eleven1Liu, @nikitavemuri, @richardliaw, @iycheng, @stephanie-wang, @HuangLED, @clarkzinzow, @fyrestone, @asm582, @qingyun-wu, @ckw017, @yncxcw, @DmitriGekhtman, @benjamindkilleen, @Chong-Li, @kathryn-zhou, @pcmoritz, @rodrigodelazcano, @edoakes, @dependabot[bot], @pdames, @frenkowski, @loicsacre, @gabrieleoliaro, @achals, @thomasjpfan, @rkooo567, @dibgerge, @clay4444, @architkulkarni, @lixin-wei, @ConeyLiu, @WangTaoTheTonic, @AnnaKosiorek, @wuisawesome, @gramhagen, @zhisbug, @franklsf95, @vakker, @jenhaoyang, @liuyang-my, @chaokunyang, @SongGuyang, @tgaddair

Contributors

ericl, pcmoritz, and 80 other contributors

Assets 2

30 Jun 18:16

DmitriGekhtman

ray-1.4.1

aa56df4

Ray-1.4.1

Release 1.4.1 Notes

Ray Python Wheels

Python 3.9 wheels (Linux / MacOS / Windows) are available (#16347 #16586)

Ray Autoscaler

🔨 Fixes: On-prem bug resolved (#16281)

Ray Client

💫Enhancements:

Add warnings when many tasks scheduled (#16454)
Better error messages (#16163)

🔨 Fixes:

Fix gRPC Timeout Options (#16554)
Disconnect on dataclient error (#16588)

Ray Core

🔨 Fixes:

Runtime Environments
- Docs (#16290)
- Bug fixes (#16475, #16535, #16378)
- Logging improvement (#16516)
Fix race condition leading to failed imports #16278
Don't broadcast empty resources data (#16104)
Fix async actor lost object bug (#16414)
Always report job timestamps in milliseconds (#16455, #16545, #16548)
Multi-node placement group and job config bug fixes (#16345)
Fix bug in task dependency management for duplicate args (#16365)
Unify Python and core worker ids (#16712)

Dask

💫Enhancements: Dask 2021.06.1 support (#16547)

Tune

💫Enhancements: Support object refs in with_params (#16753)

Serve

🔨Fixes: Ray serve shutdown goes through Serve controller (#16524)

Java

🔨Fixes: Upgrade dependencies to fix CVEs (#16650, #16657)

Documentation

Runtime Environments (#16290)
Feature contribution [Tune] (#16477)
Ray design patterns and anti-patterns (#16478)
PyTorch Lightning (#16484)
Ray Client (#16497)
Ray Deployment (#16538)
Dask version compatibility (#16595)

CI

Move wheel and Docker image upload from Travis to Buildkite (#16138 #16241)

Thanks

Many thanks to all those who contributed to this release!

@rkooo567, @clarkzinzow, @WangTaoTheTonic, @ckw017, @stephanie-wang, @Yard1, @mwtian, @jovany-wang, @jiaodong, @wuisawesome, @krfricke, @architkulkarni, @ijrsvt, @simon-mo, @DmitriGekhtman, @amogkam, @richardliaw

Assets 2

07 Jun 19:10

mwtian

ray-1.4.0

3a09c82

Ray-1.4.0

Release 1.4.0 Notes

Ray Autoscaler

🎉 New Features:

Support Helm Chart for deploying Ray on Kubernetes
Key Autoscaler metrics are now exported via Prometheus!

💫Enhancements

Better error messages when a node fails to come online

🔨 Fixes:

Stability and interface fixes for Kubernetes deployments.
Fixes to Azure NodeProvider

Ray Client

🎉 New Features:

Complete API parity with non-client mode
Experimental ClientBuilder API (docs here)
Full Asyncio support

💫Enhancements

Keep Alive for Messages for long lived connections
Improved pickling error messages

🔨 Fixes:

Client Disconnect can be called multiple times
Client Reference Equality Check
Many bug fixes and tests for the complete ray API!

Ray Core

🎉 New Features:

Namespaces (check out the docs)! Note: this may be a breaking change if you’re using detached actors (set ray.init(namespace=””) for backwards compatible behavior).

🔨 Fixes:

Support increment by arbitrary number with ray.util.metrics.Counter
Various bug fixes for the placement group APIs including the GPU assignment bug (#15049).

🏗 Architecture refactoring:

Increase the efficiency and robustness of resource reporting

Ray Data Processing

🔨 Fixes:

Various bug fixes for better stability (#16063, #14821, #15669, #15757, #15431, #15426, #15034, #15071, #15070, #15008, #15955)
Fixed a critical bug where the driver uses excessive memory usage when there are many objects in the cluster (#14322).
Dask on Ray and Modin can now be run with Ray client

🏗 Architecture refactoring:

Ray 100TB shuffle results: #15770
More robust memory management subsystem is in progress (#15157, #15027)

RLlib

🎉 New Features:

PyTorch multi-GPU support (#14709, #15492, #15421).
CQL TensorFlow support (#15841).
Task-settable Env/Curriculum Learning API (#15740).
Support for native tf.keras Models (no ModelV2 required) (#14684, #15273).
Trainer.train() and Trainer.evaluate() can run in parallel (optional) (#15040, #15345).

💫Enhancements and documentation:

CQL: Bug fixes and confirmed MuJoCo benchmarks (#15814, #15603, #15761).
Example for differentiable neural computer (DNC) network (#14844, 15939).
Added support for int-Box action spaces. (#15012)
DDPG/TD3/A[23]C/MARWIL/BC: Code cleanup and type annotations. (#14707).
Example script for restoring 1 agent out of n
Examples for fractional GPU usage. (15334)
Enhanced documentation page describing example scripts and blog posts (15763).
Various enhancements/test coverage improvements: 15499, 15454, 15335, 14865, 15525, 15290, 15611, 14801, 14903, 15735, 15631,

🔨 Fixes:

Memory Leak in multi-agent environment (#15815). Shoutout to Bam4d!
DDPG PyTorch GPU bug. (#16133)
Simple optimizer should not be used by default for tf+MA (#15365)
Various bug fixes: #15762, 14843, 15042, 15427, 15871, 15132, 14840, 14386, 15014, 14737, 15015, 15733, 15737, 15736, 15898, 16118, 15020, 15218, 15451, 15538, 15610, 15326, 15295, 15762, 15436, 15558, 15937

🏗 Architecture refactoring:

Remove atari dependency (#15292).
Trainer._evaluate() renamed to Trainer.evaluate() (backward compatible); Trainer.evaluate() can be called even w/o evaluation worker set, if create_env_on_driver=True (#15591).

Tune

🎉 New Features:

ASHA scheduler now supports save/restore. (#15438)
Add HEBO to search algorithm shim function (#15468)
Add SkoptSearcher/Bayesopt Searcher restore functionality (#15075)

💫Enhancements:

We now document scalability best practices (k8s, scalability thresholds). You can find this here (#14566)
You can now set the result buffer_length via tune.run - this helps with trials that report too frequently. (#15810)
Support numpy types in TBXlogger (#15760)
Add max_concurrent option to BasicVariantGenerator (#15680)
Add seed parameter to OptunaSearch (#15248)
Improve BOHB/ConfigSpace dependency check (#15064)

🔨Fixes:

Reduce default number of maximum pending trials to max(16, cluster_cpus) (#15628)
Return normalized checkpoint path (#15296)
Escape paths before globbing in TrainableUtil.get_checkpoints_paths (#15368)
Optuna Searcher: Set correct Optuna TrialState on trial complete (#15283)
Fix type annotation in tune.choice (#15038)
Avoid system exit error by using del when cleaning up actors (#15687)

Serve

🎉 New Features:

As of Ray 1.4, Serve has a new API centered around the concept of “Deployments.” Deployments offer a more streamlined API and can be declaratively updated, which should improve both development and production workflows. The existing APIs have not changed from Ray 1.4 and will continue to work until Ray 1.5, at which point they will be removed (see the package reference if you’re not sure about a specific API). Please see the migration guide for details on how to update your existing Serve application to use this new API.
New serve.deployment API: @serve.deployment, serve.get_deployments, serve.list_deployments (#14935, #15172, #15124, #15121, #14953, #15152, #15821)
New serve.ingress(fastapi_app) API (#15445, 15441, 14858)
New @serve.batch decorator in favor of legacy max_batch_size in backend config (#15065)
serve.start() is now idempotent (#15148)
Added support for handle.method_name.remote() (#14831)

🔨Fixes:

Rolling updates for redeployments (#14803)
Latency improvement by using pickle (#15945)
Controller and HTTP proxy uses num_cpus=0 by default (#15000)
Health checking in the controller instead of using max_restarts (#15047)
Use longest prefix matching for path routing (#15041)

Dashboard

🎉New Features:

Experimental OpenTelemetry support. (#16028,#14872,#15742).

🔨Fixes:

Add object store memory column (#15697)
Add object store stats to dashboard API. (#15677)
Remove disk data from the dashboard when running on K8s. (#14676)
Fix reported dashboard ip when using 0.0.0.0 (#15506)

Thanks

Many thanks to all those who contributed to this release!

@clay4444, @Fabien-Couthouis, @mGalarnyk, @smorad, @ckw017, @ericl, @antoine-galataud, @pleiadesian, @DmitriGekhtman, @robertnishihara, @Bam4d, @fyrestone, @stephanie-wang, @kfstorm, @wuisawesome, @rkooo567, @franklsf95, @micahtyong, @WangTaoTheTonic, @krfricke, @hegdeashwin, @devin-petersohn, @qicosmos, @edoakes, @llan-ml, @ijrsvt, @richardliaw, @Sertingolix, @ffbin, @simjay, @AmeerHajAli, @simon-mo, @tom-doerr, @sven1977, @clarkzinzow, @mxz96102, @SebastianBo1995, @amogkam, @iycheng, @sumanthratna, @Catch-Bull, @pcmoritz, @architkulkarni, @stefanbschneider, @tgaddair, @xcharleslin, @cthoyt, @fcardoso75, @Jeffwan, @mvindiola1, @michaelzhiluo, @rlan, @mwtian, @SongGuyang, @YeahNew, @kathryn-zhou, @rfali, @jennakwon06, @Yeachan-Heo

Assets 2

22 Apr 22:28

amogkam

ray-1.3.0

2a02b97

Ray-1.3.0

Release v1.3.0 Notes

Highlights

We are now testing and publishing Ray's scalability limits with each release, see: https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks
Ray Client is now usable by default with any Ray cluster started by the Ray Cluster Launcher.

Ray Cluster Launcher

💫Enhancements:

Observability improvements (#14816, #14608)
Worker nodes no longer killed on autoscaler failure (#14424)
Better validation for min_workers and max_workers (#13779)
Auto detect memory resource for AWS and K8s (#14567)
On autoscaler failure, propagate error message to drivers (#14219)
Avoid launching GPU nodes when the workload only has CPU tasks (#13776)
Autoscaler/GCS compatibility (#13970, #14046, #14050)
Testing (#14488, #14713)
Migration of configs to multi-node-type format (#13814, #14239)
Better config validation (#14244, #13779)
Node-type max workers defaults infinity (#14201)

🔨 Fixes:

AWS configuration (#14868, #13558, #14083, #13808)
GCP configuration (#14364, #14417)
Azure configuration (#14787, #14750, #14721)
Kubernetes (#14712, #13920, #13720, #14773, #13756, #14567, #13705, #14024, #14499, #14593, #14655)
Other (#14112, #14579, #14002, #13836, #14261, #14286, #14424, #13727, #13966, #14293, #14293, #14718, #14380, #14234, #14484)

Ray Client

💫Enhancements:

Version checks for Python and client protocol (#13722, #13846, #13886, #13926, #14295)
Validate server port number (#14815)
Enable Ray client server by default (#13350, #13429, #13442)
Disconnect ray upon client deactivation (#13919)
Convert Ray objects to Ray client objects (#13639)
Testing (#14617, #14813, #13016, #13961, #14163, #14248, #14630, #14756, #14786)
Documentation (#14422, #14265)

🔨 Fixes:

Hook runtime context (#13750)
Fix mutual recursion (#14122)
Set gRPC max message size (#14063)
Monitor stream errors (#13386)
Fix dependencies (#14654)
Fix ray.get ctrl-c (#14425)
Report error deserialization errors (#13749)
Named actor refcounting fix (#14753)
RayTaskError serialization (#14698)
Multithreading fixes (#14701)

Ray Core

🎉 New Features:

We are now testing and publishing Ray's scalability limits with each release. Check out https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks.
[alpha] Ray-native Python-based collective communication primitives for Ray clusters with distributed CPUs or GPUs.

🔨 Fixes:

Ray is now using c++14.
Fixed high CPU breaking raylets with heartbeat missing errors (#13963, #14301)
Fixed high CPU issues from raylet during object transfer (#13724)
Improvement in placement group APIs including better Java support (#13821, #13858, #13582, #15049, #13821)

Ray Data Processing

🎉 New Features:

Object spilling is turned on by default. Check out the documentation.
Dask-on-Ray and Spark-on-Ray are fully ready to use. Please try them out and give us feedback!
Dask-on-Ray is now compatible with Dask 2021.4.0.
Dask-on-Ray now works natively with dask.persist().

🔨 Fixes:

Various improvements in object spilling and memory management layer to support large scale data processing (#13649, #14149, #13853, #13729, #14222, #13781, #13737, #14288, #14578, #15027)
lru_evict flag is now deprecated. Recommended solution now is to use object spilling.

🏗 Architecture refactoring:

Various architectural improvements in object spilling and memory management. For more details, check out the whitepaper.
Locality-aware scheduling is turned on by default.
Moved from centralized GCS-based object directory protocol to decentralized owner-to-owner protocol, yielding better cluster scalability.

RLlib

🎉 New Features:

R2D2 implementation for torch and tf. (#13933)
PlacementGroup support (all RLlib algos now return PlacementGroupFactory from Trainer.default_resource_request). (#14289)
Multi-GPU support for tf-DQN/PG/A2C. (#13393)

💫Enhancements:

Documentation: Update documentation for Curiosity's support of continuous actions (#13784); CQL documentation (#14531)
Attention-wrapper works with images and supports prev-n-actions/rewards options. (#14569)
rllib rollout runs in parallel by default via Trainer’s evaluation worker set. (#14208)
Add env rendering (customizable) and video recording options (for non-local mode; >0 workers; +evaluation-workers) and episode media logging. (#14767, #14796)
Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (#13522)
Example Scripts: Add coin game env + matrix social dilemma env + tests and examples (shoutout to Maxime Riché!). (#14208); Attention net (#14864); Serve + RLlib. (#14416); Env seed (#14471); Trajectory view API (enhancements and tf2 support). (#13786); Tune trial + checkpoint selection. (#14209)
DDPG: Add support for simplex action space. (#14011)
Others: on_learn_on_batch callback allows custom metrics. (#13584); Add TorchPolicy.export_model(). (#13989)

🔨 Fixes:

Trajectory View API bugs (#13646, #14765, #14037, #14036, #14031, #13555)
Test cases (#14620, #14450, #14384, #13835, #14357, #14243)
Others (#13013, #14569, #13733, #13556, #13988, #14737, #14838, #15272, #13681, #13764, #13519, #14038, #14033, #14034, #14308, #14243)

🏗 Architecture refactoring:

Remove all non-trajectory view API code. (#14860)
Obsolete UsageTrackingDict in favor of SampleBatch. (#13065)

Tune

🎉 New Features:

We added a new searcher HEBOSearcher (#14504, #14246, #13863, #14427)
Tune is now natively compatible with the Ray Client (#13778, #14115, #14280)
Tune now uses Ray’s Placement Groups underneath the hood. This will enable much faster autoscaling and training (for distributed trials) (#13906, #15011, #14313)

💫Enhancements:

Checkpointing improvements (#13376, #13767)
Optuna Search Algorithm improvements (#14731, #14387)
tune.with_parameters now works with Class API (#14532)

🔨Fixes:

BOHB & Hyperband fixes (#14487, #14171)
Nested metrics improvements (#14189, #14375, #14379)
Fix non-deterministic category sampling (#13710)
Type hints (#13684)
Documentation (#14468, #13880, #13740)
Various issues and bug fixes (#14176, #13939, #14392, #13812, #14781, #14150, #14850, #14118, #14388, #14152, #13825, #13936)

SGD

Add fault tolerance during worker startup (#14724)

Serve

🎉 New Features:

Added metadata to default logger in backend replicas (#14251)
Added more metrics for ServeHandle stats (#13640)
Deprecated system-level batching in favor of @serve.batch (#14610, #14648)
Beta support for Serve with Ray client (#14163)
Use placement groups to bypass autoscaler throttling (#13844)
Deprecate client-based API in favor of process-wide singleton (#14696)
Add initial support for FastAPI ingress (#14754)

🔨 Fixes:

Fix ServeHandle serialization (#13695)

🏗 Architecture refactoring:

Refactor BackendState to support backend versioning and add more unit testing (#13870, #14658, #14740, #14748)
Optimize long polling to be per-key (#14335)

Dashboard

🎉 New Features:

Dashboard now supports being served behind a reverse proxy. (#14012)
Disk and network metrics are added to prometheus. (#14144)

💫Enhancements:

Better CPU & memory information on K8s. (#14593, #14499)
Progress towards a new scalable dashboard. (#13790, #11667, #13763,#14333)

Thanks

Many thanks to all those who contributed to this release:
@geraint0923, @iycheng, @yurirocha15, @brian-yu, @harryge00, @ijrsvt, @wumuzi520, @suquark, @simon-mo, @clarkzinzow, @RaphaelCS, @FarzanT, @ob, @ashione, @ffbin, @robertnishihara, @SongGuyang, @zhe-thoughts, @rkooo567, @Ezra-H, @acxz, @clay4444, @QuantumMecha, @jirkafajfr, @wuisawesome, @Qstar, @guykhazma, @devin-petersohn, @jeroenboeye, @ConeyLiu, @dependabot[bot], @fyrestone, @micahtyong, @javi-redondo, @Manuscrit, @mxz96102, @EscapeReality846089495, @WangTaoTheTonic, @stanislav-chekmenev, @architkulkarni, @Yard1, @tchordia, @zhisbug, @Bam4d, @niole, @yiranwang52, @thomasjpfan, @DmitriGekhtman, @gabrieleoliaro, @jparkerholder, @kfstorm, @andrew-rosenfeld-ts, @erikerlandson, @Crissman, @raulchen, @sumanthratna, @Catch-Bull, @chaokunyang, @krfricke, @raoul-khour-ts, @sven1977, @kathryn-zhou, @AmeerHajAli, @jovany-wang, @amogkam, @antoine-galataud, @tgaddair, @randxie, @ChaceAshcraft, @ericl, @cassidylaidlaw, @TanjaBayer, @lixin-wei, @lena-kashtelyan, @cathrinS, @qicosmos, @richardliaw, @rmsander, @jCrompton, @mjschock, @pdames, @barakmich, @michaelzhiluo, @stephanie-wang, @edoakes

Assets 2

13 Feb 01:42

wuisawesome

ray-1.2.0

b87fc1b

Release ray-1.2.0

Release v1.2.0 Notes

Highlights

Ray client is now in beta! Check out more details here: https://docs.ray.io/en/master/ray-client.html
XGBoost-Ray is now in beta! Check out more details about this project at https://github.com/ray-project/xgboost_ray.
Check out the Serve migration guide: https://docs.google.com/document/d/1CG4y5WTTc4G_MRQGyjnb_eZ7GK3G9dUX6TNLKLnKRAc/edit
Ray’s C++ support is now in beta: https://docs.ray.io/en/master/#getting-started-with-ray
An alpha version of object spilling is now available: https://docs.ray.io/en/master/memory-management.html#object-spilling

Ray Autoscaler

🎉 New Features:

A new autoscaler output format in monitor.log (#12772, #13561)
Piping autoscaler events to driver logs (#13434)

💫Enhancements

Full support of ray.autoscaler.sdk.request_resources() API (https://docs.ray.io/en/master/cluster/autoscaling.html?highlight=request_resources#ray.autoscaler.sdk.request_resources) .
Make placement groups bypass max launch limit (#13089)
[K8s] Retry getting home directory in command runner. (#12925)
[docker] Pull if image is not present (#13136)
[Autoscaler] Ensure ubuntu is owner of docker host mount folder (#13579)

🔨 Fixes:

Many autoscaler bug fixes (#12952, #12689, #13058, #13671, #13637, #13588, #13505, #13154, #13151, #13138, #13008, #12980, #12918, #12829, #12714, #12661, #13567, #13663, #13623, #13437, #13498, #13472, #13392, #12514, #13325, #13161, #13129, #12987, #13410, #12942, #12868, #12866, #12865, #12098, #12609)

RLLib

🎉 New Features:

Fast Attention Nets (using the trajectory view API) (#12753).
Attention Nets: Full PyTorch support (#12029).
Attention Nets: Support auto-wrapping around default- or custom models by specifying “use_attention=True” in the model’s config. * * * This works completely analogously now to “use_lstm=True”. (#11698)
New Offline RL Algorithm: CQL (based on SAC) (#13118).
MAML: Discrete actions support (added CartPole mass test case).
Support Atari framestacking via the trajectory view API (#13315).
Support for D4RL environments/benchmarks (#13550).
Preliminary work on JAX support (#13077, #13091).

💫 Enhancements:

Rollout lengths: Allow unit to be configured as “agent_steps” in multi-agent settings (default: “env_steps”) (#12420).
TFModelV2: Soft-deprecate register_variables and unify var names wrt TorchModelV2 (#13339, #13363).

📖 Documentation:

Added documentation on Model building API (#13260, #13261).
Added documentation for the trajectory view API. (#12718)
Added documentation for SlateQ (#13266).
Readme.md documentation for almost all algorithms in rllib/agents (#12943, #13035).
Type annotations for the “rllib/execution” folder (#12760, #13036).

🔨 Fixes:

MARWIL and BC: Add grad-clipping config option to stabilize learning (#13455).
A3C: Solve PyTorch- and TF-eager async race condition between calling model and its value function (#13467).
Various issues- and bug fixes (#12619, #12682, #12704, #12706, #12708, #12765, #12786, #12787, #12793, #12832, #12844, #12846, #12915, #12941, #13039, #13040, #13064, #13083, #13121, #13126, #13237, #13238, #13308, #13332, #13397, #13459, #13553).
###🏗 Architecture refactoring:
Env directory has been cleaned up and is now divided in: Core part (rllib/env) with all basic env classes, and rllib/env/wrappers containing third-party wrapper classes (Atari, Unity3D, etc..) (#13082).

Tune

🎉 New Features:

Ray Tune has updated and improved its integration with MLflow. See this blog post for details (#12840, #13301, #13533)

💫 Enhancements

Ray Tune now uses ray.cloudpickle underneath the hood, allowing you to checkpoint large models (>4GB) (#12958).
Using the 'reuse_actors' flag can now speed up training for general Trainable API usage. (#13549)
Ray Tune will now automatically buffer results from trainables, allowing you to use an arbitrary reporting frequency on your training functions. (#13236)
Ray Tune now has a variety of experiment stoppers (#12750)
Ray Tune now supports an integer loguniform search space distribution (#12994)
Ray Tune now has an initial support for the Ray placement group API. (#13370)
The Weights and Bias integration (WandbLogger) now also accepts wandb.data_types.Video (#13169)
The Hyperopt integration (HyperoptSearch) can now directly accept category variables instead of indices (#12715)
Ray Tune now supports experiment checkpointing when using grid search (#13357)

🔨Fixes and Updates

The Optuna integration was updated to support the 2.4.0 API while maintaining backwards compatibility (#13631)
All search algorithms now support points_to_evaluate (#12790, #12916)
PBT Transformers example was updated and improved (#13174, #13131)
The scikit-optimize integration was improved (#12970)
Various bug fixes (#13423, #12785, #13171, #12877, #13255, #13355)

SGD

🔨Fixes and Updates

Fix Docstring for as_trainable (#13173)
Fix process group timeout units (#12477)
Disable Elastic Training by default when using with Tune (#12927)

Serve

🎉 New Features:

Ray Serve backends now accept a Starlette request object instead of a Flask request object (#12852). This is a breaking change, so please read the migration guide.
Ray Serve backends now have the option of returning a Starlette Response object (#12811, #13328). This allows for more customizable responses, including responses with custom status codes.
[Experimental] The new Ray Serve MLflow plugin makes it easy to deploy your MLflow models on Ray Serve. It comes with a Python API and a command-line interface.
Using “ImportedBackend” you can now specify a backend based on a class that is installed in the Python environment that the workers will run in, even if the Python environment of the driver script (the one making the Serve API calls) doesn’t have it installed (#12923).

💫 Enhancements:

Dependency management using conda no longer requires the driver script to be running in an activated conda environment (#13269).
Ray ObjectRef can now be used as argument to serve_handle.remote(...). (#12592)
Backends are now shut down gracefully. You can set the graceful timeout in BackendConfig. (#13028)

📖 Documentation:

A tutorial page has been added for integrating Ray Serve with your existing FastAPI web server or with your existing AIOHTTP web server (#13127).
Documentation has been added for Ray Serve metrics (#13096).

Assets 2

Releases: ray-project/ray

Ray-1.9.0

Highlights

Ray Autoscaler

Ray Client

Ray Core

Ray Java

Ray Datasets

Ray Workflow

RLlib

Tune

Train

Serve

Dashboard

Thanks

Contributors

Ray-1.8.0

Highlights

Ray Autoscaler

Ray Client

Ray Core

Ray Data

RLlib

Ray Workflow

Tune

Ray Train (SGD)

Serve

Dashboard

Thanks

Contributors

Ray-1.6.0

Highlights

Ray Autoscaler

Ray Client

Ray Core

Ray Data Processing

RLLib

Tune

SGD

Serve

Thanks

Contributors

Ray-1.5.2

Ray-1.5.1

Ray-1.5.0

Ray 1.5.0 Release Note

Highlight

Ray Autoscaler

Ray Client

Ray Core

RLlib

Tune

SGD

Serve

Thanks

Contributors

Ray-1.4.1

Release 1.4.1 Notes

Ray Python Wheels

Ray Autoscaler

Ray Client

Ray Core

Dask

Tune

Serve

Java

Documentation

CI

Thanks

Ray-1.4.0

Release 1.4.0 Notes

Ray Autoscaler

🎉 New Features:

💫Enhancements

🔨 Fixes:

Ray Client

🎉 New Features:

💫Enhancements

🔨 Fixes:

Ray Core