Skip to content

Releases: typelevel/cats-effect

v3.4.5

16 Jan 22:48
v3.4.5
feff5f6
Compare
Choose a tag to compare

This is the thirty-fifth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

This release rolls back the Deferred[IO, A] optimizations for the time being due to a memory leak in certain common scenarios. In particular, any use of Fs2's interruptWhen where the stream in question naturally completes quickly would hit this case relatively hard. Like, for example, Http4s Ember. We have a fix for the memory leak which needs a bit more testing before release, and we felt that, out of an abundance of caution, it is better to revert the changes immediately rather than waiting for the hardening.

User-Facing Pull Requests

Thank you so very much!

v3.4.4

30 Dec 16:44
v3.4.4
2c6cc39
Compare
Choose a tag to compare

This is the thirty-fourth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

This release fixes a memory leak in Deferred. The memory leak in question is relatively small, but can accumulate over a long period of time in certain common applications. Additionally, this leak regresses GC performance slightly for almost all Cats Effect applications. For this reason, it is highly recommended that users upgrade to this release as soon as possible if currently using version 3.4.3.

User-Facing Pull Requests

Thank you so very much!

v3.4.3

24 Dec 23:01
v3.4.3
3ab83ce
Compare
Choose a tag to compare

This is the thirty-third release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

Despite being a patch release, this update contains two major notable feature additions: full tracing support for Scala Native applications (including enhanced exceptions!), and significantly improved performance for Deferred when IO is the base monad. Regarding the latter, since Deferred is at the core of most concurrent logic written against Cats Effect, it is expected that this change will result in some noticeable performance improvements in most applications, though it is hard to predict exactly how pronounced this effect will be.

User-Facing Pull Requests

Very special thanks to all of you!

v3.4.2

29 Nov 15:39
v3.4.2
88170b9
Compare
Choose a tag to compare

This is the thirty-second release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

Thank you so much!

v3.4.1

17 Nov 06:17
v3.4.1
79eaa2d
Compare
Choose a tag to compare

This is the thirty-first release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. The primary purpose of this release is to address a minor link-time regression which manifested when extending IOApp with a class (not a trait) which was in turn extended by another class. In this scenario, the resulting main class would hang on exit if the intervening extension class had not been recompiled against Cats Effect 3.4.0. Note that this issue with separate compilation and IOApp does remain in a limited form: the MainThread executor is inaccessible when linked in this fashion. The solution is to ensure that all compilation units which extend IOApp (directly or indirectly) are compiled against Cats Effect 3.4.0 or later.

User-Facing Pull Requests

Thank you, everyone!

v3.4.0

13 Nov 16:20
v3.4.0
7710942
Compare
Choose a tag to compare

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

A Note on Release Cadence

While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.

For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.

Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.

Major Changes

As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.

High Performance Queue

One of the core concurrency utilities in Cats Effect is Queue. Despite its ubiquity in modern applications, the implementation of Queue has always been relatively naive, based entirely on immutable data structures, Ref, and Deferred. In particular, the core of the bounded Queue implementation since 3.0 looks like the following:

final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])

final case class State[F[_], A](
    queue: ScalaQueue[A],
    size: Int,
    takers: ScalaQueue[Deferred[F, Unit]],
    offerers: ScalaQueue[Deferred[F, Unit]])

The ScalaQueue type refers to scala.collection.immutable.Queue, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take) within this implementation rely on Ref#modify to update internal state, with Deferred functioning as a signalling mechanism when take or offer need to semantically block (because the queue is empty or full, respectively).

This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref and Deferred, it is usable in any context which has a Concurrent constraint on F[_], allowing for a significant amount of generality and abstraction within downstream frameworks.

Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent abstraction.

Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async constraint in place of the Concurrent one.

The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue implementation only requires Concurrent, and it would not make much sense to force an Async constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async, both to directly implement asynchronous suspension (rather than redirecting through Deferred) and to safely suspend the side-effects required to manipulate mutable data structures.

This problem has been solved by using runtime casing on the Concurrent instance behind the scenes. In particular, whenever you construct a Queue.bounded, the runtime type of that instance is checked to see if it is secretly an Async. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.

As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue implementation from Java to Scala, making slight adjustments along the way. In particular:

  • The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
  • Several minor optimizations have been elided, most notably those which rely on sun.misc.Unsafe for manipulation of directional memory fences
  • Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for null values without introducing extra boxing
  • Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer

All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.

This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue. In particular, when offer fails on MpmcArrayQueue (because the queue is full), it simply rejects the value. When offer fails on Cats Effect's Queue, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.

In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue (from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.

Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:

This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.

The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at the top, the queue data structure itself does not appear to be the performance bottleneck in any practical application, but every bit helps, and free performance is still free performance!

Hardened Queue Semantics

As a part of the rework of the core data structures, it was decided to make a very subtle change to the semantics of the Queue data structure while under heavy load, particularly in true multi-producer, multi-consumer (MPMC) scenarios. Under certain circumstances, the previous implementation of Queue could actually lose data. This manifested when one fiber enqueued a value, while another fiber dequeued that value and was canceled during the dequeue. When this happened, it...

Read more

v3.4.0-RC2

10 Oct 04:42
v3.4.0-RC2
2ca91dc
Compare
Choose a tag to compare
v3.4.0-RC2 Pre-release
Pre-release

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

For a more comprehensive treatment of all changes between 3.3.x and 3.4.0, please see the RC1 release notes. The following notes only cover the changes between RC1 and RC2.

User-Facing Pull Requests

A very special thanks to all!

v3.4.0-RC1

29 Sep 03:00
v3.4.0-RC1
18fb15d
Compare
Choose a tag to compare
v3.4.0-RC1 Pre-release
Pre-release

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

With this release, we're taking the unusual step of going through a release candidate cycle prior to 3.4.0 final. This process is designed to make it easier for the downstream ecosystem to try the new release and identify subtle incompatibilities or real world issues that are hard for us to entirely eliminate in-house. Binary- and source-compatibility is not guaranteed between release candidates, or between RCs and the final release, though major changes are very unlikely. If you represent a downstream framework or application, please do take the time to try out this release candidate and report any issues! We're particularly interested in feedback from applications which make heavy use of Queue.

A Note on Release Cadence

While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.

For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.

Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.

Major Changes

As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.

High Performance Queue

One of the core concurrency utilities in Cats Effect is Queue. Despite its ubiquity in modern applications, the implementation of Queue has always been relatively naive, based entirely on immutable data structures, Ref, and Deferred. In particular, the core of the bounded Queue implementation since 3.0 looks like the following:

final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])

final case class State[F[_], A](
    queue: ScalaQueue[A],
    size: Int,
    takers: ScalaQueue[Deferred[F, Unit]],
    offerers: ScalaQueue[Deferred[F, Unit]])

The ScalaQueue type refers to scala.collection.immutable.Queue, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take) within this implementation rely on Ref#modify to update internal state, with Deferred functioning as a signalling mechanism when take or offer need to semantically block (because the queue is empty or full, respectively).

This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref and Deferred, it is usable in any context which has a Concurrent constraint on F[_], allowing for a significant amount of generality and abstraction within downstream frameworks.

Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent abstraction.

Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async constraint in place of the Concurrent one.

The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue implementation only requires Concurrent, and it would not make much sense to force an Async constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async, both to directly implement asynchronous suspension (rather than redirecting through Deferred) and to safely suspend the side-effects required to manipulate mutable data structures.

This problem has been solved by using runtime casing on the Concurrent instance behind the scenes. In particular, whenever you construct a Queue.bounded, the runtime type of that instance is checked to see if it is secretly an Async. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.

As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue implementation from Java to Scala, making slight adjustments along the way. In particular:

  • The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
  • Several minor optimizations have been elided, most notably those which rely on sun.misc.Unsafe for manipulation of directional memory fences
  • Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for null values without introducing extra boxing
  • Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer

All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.

This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue. In particular, when offer fails on MpmcArrayQueue (because the queue is full), it simply rejects the value. When offer fails on Cats Effect's Queue, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.

In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue (from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.

Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:

This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.

The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at...

Read more

v3.3.14

12 Jul 23:08
v3.3.14
badc924
Compare
Choose a tag to compare

This is the twenty-ninth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release contains significant fixes for the interruptibleMany function, which could (under certain circumstances) result in a full runtime deadlock.

User-Facing Pull Requests

  • #3081 – Improved granularity of interruptible loops (@durban)
  • #3074 – Resolve race condition in interruptibleMany after interruption (@djspiewak)
  • #3064 – Handle Uncancelable and OnCancel in syncStep interpreter (@armanbilge)
  • #3069 – Documentation fixes and improvements (@TonioGela)

Special thanks to all of you!

v3.3.13

28 Jun 19:16
v3.3.13
733a4b4
Compare
Choose a tag to compare

This is the twenty-eighth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

Thank you very much!