Releases: typelevel/cats-effect
v3.3.3
This is the eighteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
This release contains a fix for a regression introduced in 3.3.0 related to IOApp
s which exit with non-fatal errors when run from within Sbt with fork
set to false
. In that scenario, the runtime worker threads would end up hung in a busy-wait loop and eat up all available CPU despite returning control to the Sbt shell. Despite this fix, it is still recommended that you set run / fork := true
in Sbt to work around other bugs in Sbt itself (specifically related to both Ctrl-C and System.exit
suppression).
User-Facing Pull Requests
- #2705 – Detect sbt thread cleaner and terminate workers (@djspiewak)
- #2707 – Hard exit in Node.js
IOApp
(@armanbilge)
Thank you so much!
v3.3.2
This is the seventeenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
This patch release focuses primarily on performance improvements in two major areas: blocking
/interruptible
and suspended fiber tracking.
In the former area, the Cats Effect fiber runtime has long had support for the scala.concurrent.blocking
construct within any code which is scheduled on its worker threads. When such a block is hit, the runtime takes it as a signal that it is about to lose a functioning worker thread and thus spawns a new one, seamlessly putting it into rotation to ensure the pool is not starved by the current worker thread being blocked. This trick works very well, but wasn't particularly recommended in user code because the performance was worse than the native IO.blocking
operation.
In this release, Vasil has changed the behavior of the pool to seamlessly shift worker state when a blocking
section is hit, effectively morphing another thread into the exact state as the now-blocked thread. Additionally, spare threads constructed when blocking operations are hit are now cached for one minute before being cleaned up if still idle, ensuring that they're still around if a subsequent blocking
operation is hit in short order.
These improvements, taken together, mean that scala.concurrent.blocking
inside of delay
is actually faster than the IO.blocking
operation by a significant margin, meaning that we can reap immediate performance benefits by converting IO.blocking
and IO.interruptible
to use this native mechanism rather than an ancillary thread pool.
Please note that the above is plotted on a log scale to make it easier to see the relative differences in each scenario. For reference, the improvements in the "fine grained" benchmark represent the test running 141x faster! (not a percent sign) Blocking is still bad for throughput, but it's a lot less bad now. You can find all of these benchmarks in the repository.
As if that weren't enough, we've reimplemented the tracking mechanism for suspended fibers which underlies the new fiber dump feature introduced in 3.3.0. This feature was and is implemented using a thread local set-like data structure which maintains weak references to any suspended fiber. The weak references are necessary for two reasons. First, it ensures that any fiber which is suspended and then the callback is "lost" can still be garbage collected normally. Second, it allows us to avoid the extra memory barriers associated with backtracking to the suspending thread when the fiber is resumed, making the whole mechanism significantly faster.
Unfortunately, this comes with a cost: these weak references must be examined and ultimately cleaned by the garbage collector, which means that we're effectively taking synchronous work out of the main code path and moving it asynchronously into the garbage collector. This in turn can mean that certain types of workflows which already put significant pressure on the GC may have seen diminished performance with the update to 3.3.0.
This release significantly reduces the GC overhead by simplifying and specializing the data structure to reduce the number of weak references and allocations involved in the tracking itself. The results should be unnoticeable in most optimized workloads, but for applications which are creating a significant amount of short-lived objects within their hot path, these changes should produce a substantial speed-up relative to 3.3.1.
User-Facing Pull Requests
- #2699 – Removed
interruptible
dependency on an explicitExecutionContext
(@djspiewak) - #2687 – Blocking mechanism with cached threads (@vasilmkd)
- #2673 – Cross platform weak bag implementation (@vasilmkd)
Thank you so much!
v3.3.1
This is the sixteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
This release contains bug fixes and performance enhancements to tracing and some other areas. Both enhanced exceptions and fiber dumps should now behave better on the latest versions of the JVM (which use a different top-level package identifier). Tracing on Scala.js can now be fully disabled if its overhead is too high, whereas previously some bookkeeping was retained even when tracing was configured to off.
Most significantly, Resource.uncancelable
previously contained a significant bug in which error states were rewritten within the block. This ultimately stemmed from a limitation in the original Resource
API with respect to full representation of outcomes, and it indirectly impacted all use of Async[Resource]
. Thanks to the efforts of @TimWSpence, these bugs have now been completely squashed with the addition of a new Resource
interpreter: allocatedCase
(originally proposed by @kubukoz). In the interest of maintaining forward-compatibility, this function is currently marked as package-private, but will be marked as public in Cats Effect 3.4.0.
User-Facing Pull Requests
- #2617 – Add
Resource#allocatedCase
(@TimWSpence) - #2665 – Fixes cancelling
CompletableFuture
on fiber cancellation (@LightSystem) - #2662 – micro-optimization: use
Map.empty
instead ofMap.apply
(@yanns) - #2642 – Only use
FiberAwareExecutionContext
if tracing is enabled (@armanbilge) - #2623 – Add
jdk.
to trace filter (@armanbilge) - #2609 – Fix trace filtering (@alexandrustana)
- #2607 – Duplicated fibers in fiber dump (@TimWSpence)
- #2600 – Handle interruption gracefully and halt the runtime on fatal errors (@vasilmkd)
- #2654, #2635, #2647, #2629, #2591 – Documentation fixes and improvements (@danicheg, @zarthross, @danicheg, @xuwei-k, @djspiewak)
Heartfelt thanks; you are all amazing!
v3.3.0
This is the fifteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas (detailed below). Scalafixes are available and should be automatically applied by Scala Steward if relevant.
The theme of this release has been improving observability, testability, and debuggability of all applications using Cats Effect 3. This has resulted in a massive set of new functionality, tweaks, and improvements which make 3.3.0 the most significant Cats Effect release ever apart from 3.0.0 itself. The developer experience has been significantly improved, particularly in tricky areas such as diagnosing deadlocks and deterministically testing functionality involving timers and clocks. Additionally, new functionality has been brought to Scala.js, including full support for tracing!
Finally, we took the opportunity to continue to build on IO
's best-in-class performance with significant improvements to the fiber runtime (including dynamic workload fairness self-tuning) and continued dramatic slimming of the fiber memory footprint. Embarrassingly, we even discovered that the build was using an unnecessary Scala compiler flag which inhibited performance in some scenarios (particularly involving large numbers of fibers) by up to 15%! All of these improvements should add up into a very noticeable leap forward for application metrics in nearly all real-world scenarios.
Notable Changes
Thread Fiber Dumps
One of the most annoying and difficult problems to resolve in any asynchronous application is the asynchronous deadlock. This scenario happens when you have a classic deadlock of some variety, where one fiber is waiting for a second fiber which is in turn waiting for the first (in the simplest case). Due to the asynchronous nature of the runtime, whenever a fiber blocks, all references to it are removed from the internal runtime, meaning that a deadlock generally leaves absolutely no residue whatsoever, and the only recourse as a developer is to just start sprinkling IO.println
expressions around the code to see if you can figure out where it's getting stuck.
This is very much in contrast to a conventional deadlock in a synchronous runtime, where we have JVM-level tools such as thread dumps to suss out where things are stuck. In particular, thread dumps are a commonly-applied low level tool offered by the JVM which can serve to inform users of what threads are active at that point in time and what each of their call stacks are. This tool is generally quite useful, but it becomes even more useful when the threads are deadlocked: the call stacks show exactly where each thread is blocked, making it relatively simple to reconstruct what they are blocked on and thus how to untie the knot.
Fiber dumps are a similar construct for Cats Effect applications. Even better, you don't need to change anything in order to take advantage of this functionality. As a simple example, here is an application which trivially deadlocks:
import cats.effect.{IO, IOApp}
object Deadlock extends IOApp.Simple {
val run =
for {
latch <- IO.deferred[Unit]
body = latch.get
fiber <- body.start
_ <- fiber.join
_ <- latch.complete(())
} yield ()
}
The main fiber is waiting on fiber.join
, which will only be completed once latch
is released, which in turn will only happen on the main fiber after the child fiber completes. Thus, both fibers are deadlocked on each other. Prior to fiber dumps, this situation would be entirely invisible. Manually traversing the IO
internals via a heap dump would be the only mechanism for gathering clues as to the problem, which is far from user-friendly and also generally fruitless.
As of Cats Effect 3.3.0, users can now simply trigger a fiber dump to get the following diagnostic output printed to standard error:
cats.effect.IOFiber@56824a14 WAITING
├ flatMap @ Deadlock$.$anonfun$run$2(Deadlock.scala:26)
├ flatMap @ Deadlock$.$anonfun$run$1(Deadlock.scala:25)
├ deferred @ Deadlock$.<clinit>(Deadlock.scala:22)
├ flatMap @ Deadlock$.<clinit>(Deadlock.scala:22)
╰ run$ @ Deadlock$.run(Deadlock.scala:19)
cats.effect.IOFiber@6194c61c WAITING
├ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
╰ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
Thread[io-compute-14,5,run-main-group-6] (#14): 0 enqueued
Thread[io-compute-12,5,run-main-group-6] (#12): 0 enqueued
Thread[io-compute-6,5,run-main-group-6] (#6): 0 enqueued
Thread[io-compute-5,5,run-main-group-6] (#5): 0 enqueued
Thread[io-compute-8,5,run-main-group-6] (#8): 0 enqueued
Thread[io-compute-9,5,run-main-group-6] (#9): 0 enqueued
Thread[io-compute-11,5,run-main-group-6] (#11): 0 enqueued
Thread[io-compute-7,5,run-main-group-6] (#7): 0 enqueued
Thread[io-compute-10,5,run-main-group-6] (#10): 0 enqueued
Thread[io-compute-4,5,run-main-group-6] (#4): 0 enqueued
Thread[io-compute-13,5,run-main-group-6] (#13): 0 enqueued
Thread[io-compute-0,5,run-main-group-6] (#0): 0 enqueued
Thread[io-compute-2,5,run-main-group-6] (#2): 0 enqueued
Thread[io-compute-3,5,run-main-group-6] (#3): 0 enqueued
Thread[io-compute-1,5,run-main-group-6] (#1): 0 enqueued
Thread[io-compute-15,5,run-main-group-6] (#15): 0 enqueued
Global: enqueued 0, foreign 0, waiting 2
A fiber dump prints every fiber known to the runtime, regardless of whether they are suspended, blocked, yielding, active on some foreign runtime (via evalOn
), or actively running on a worker thread. You can see an example of a larger dump in this gist. Each fiber given a stable unique hexadecimal ID and paired with its status as well as its current trace, making it extremely easy to identify problems such as our earlier deadlock: the first fiber is suspended at line 26 (fiber.join
) while the second fiber is suspended at line 24 (latch.get
). This gives us a very good idea of what's happening and how to fix it.
Note that most production applications have a lot of fibers at any point in time (millions and even tens of millions are possible even on consumer hardware), so the dump may be quite large. It's also worth noting that this is a statistical snapshot mechanism. The data it is aggregating is spread across multiple threads which may or may not have all published into main memory at a given point in time. Thus, it isn't necessarily an instantaneously consistent view of the runtime. Under some circumstances, trace information for a given fiber may be behind its actual position, or a fiber may be reported as being in one state (e.g. YIELDING
) when in fact it is in a different one (e.g. WAITING
). Under rare circumstances, newly-spawned fibers may be missed. These circumstances are considerably more common on ARM architectures than they are under x86 due to store order semantics.
Summary statistics for the global fiber runtime are printed following the fiber traces. In the above example, these statistics are relatively trivial, but in a real-world application this can give you an idea of where your fibers are being scheduled.
Triggering the above fiber dump is a matter of sending a POSIX signal to the process using the kill
command. The exact signal is dependent on the JVM (and version thereof) and operating system under which your application is running. Rather than attempting to hard-code all possible compatible signal configurations, Cats Effect simply attempts to register both INFO
and USR1
(for JVM applications) or USR2
(for Node.js applications). In practice, INFO
will most commonly be used on macOS and BSD, while USR1
is more common on Linux. Thus, kill -INFO <pid>
on macOS and kill -USR1 <pid>
on Linux (or USR2
for Node.js applications). POSIX signals do not exist on Windows (except under WSL, which behaves exactly like a normal Linux), and thus the mechanism is disabled.
Since INFO
is the signal used on macOS and BSD, this combined with a quirk of Apple's TTY implementation means that anyone running a Cats Effect application on macOS can simply hit Ctrl-T within the active application to trigger a fiber dump, similar to how you can use Ctrl-\ to trigger a thread dump. Note that this trick only works on macOS, since that is the only platform which maps a particular keybind to either the INFO
or USR1
signals.
In the event that you're either running on a platform which doesn't support POSIX signals, or the signal registration failed for whatever reason, Cats Effect on the JVM will also automatically register an MBean under cats.effect.unsafe.metrics.LiveFiberSnapshotTriggerMBean
which can produce a string representation of the fiber dump when its only method is invoked.
This entire mechanism has no performance impact (well, it probably would if you kept printing the dump in a loop, but don't do that). It is controlled by the same configuration as tracing.
And in case you were wondering, yes, it does work on Node.js applications!
cats.effect.IOFiber@d WAITING
cats.effect.IOFiber@9 WAITING
╰ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:22)
cats.effect.IOFiber@a WAITING
├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:26)
├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:25)
├ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala...
v3.2.9
This is the fourteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.
The only change in this release (from 3.2.8) is a minor bugfix which affects Semaphore
. In particular, under certain circumstnaces, fibers awaiting permits could end up indefinitely stuck awaiting wake-up. This could happen whenever multiple fibers were awaiting and the releasing fiber was canceled while notifying the awaiters.
User-Facing Pull Requests
- #2350 – Fix Cancelation Point in Semaphore (@ChristopherDavenport)
Thank you so much!
v2.5.4
This is the seventeenth major release in the Cats Effect 2.x lineage. It is fully binary compatible with all 2.x.y releases.
The primary change in this release is a new feature: tracing support for delay
and defer
! This is something that Cats Effect 3 has supported for some time now, but it was never supported in CE2 for various reasons. In implementing this feature, we also backported the thunk acquisition fix from Cats Effect 3, which works around some of the changes in Scala 3's encoding of by-name parameters.
User-Facing Pull Requests
- #2230 – Add cached tracing support for
IO.delay
andIO.defer
(@vasilmkd) - #2228 – Backport thunk (@armanbilge)
Very special thanks to all of you!
v3.2.8
This is the thirteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.
This release reverts the changes to the priority queue implemented in cats.effect.std.PQueue
, namely it follows other standard libraries in the choice that FIFO semantics are not necessarily respected when elements are tied in terms of priority.
Furthermore, this release brings several bug fixes for corner cases and performance improvements to the Cats Effect runtime support for detecting and guarding against scala.concurrent.blocking
actions (calling Await.result
on scala.concurrent.Future
or calling unsafeRunSync()
on the compute runtime).
User-Facing Pull Requests
- #2309 - Revert PQueue FIFO priority ties (@SystemFw)
- #2312 - Address issues with the blocking mechanism of the thread pool (@vasilmkd)
We hope you enjoy this release. Thank you.
v3.2.7
v3.2.6
v3.2.5
This is the eleventh major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.
This release fixes a regression in the work-stealing fiber scheduler. In addition, it reverts a change to the Async#fromPromise
and IO.fromPromise
signatures which was source-incompatible in numerous common scenarios due to limitations in Scala's type inference.
User-Facing Pull Requests
- #2272 – Revert
Async#fromPromise
generalization (@djspiewak) - #2270 – Fix overly eager
drainBatch
(@vasilmkd)
Thank you!