Skip to content

Releases: dotnet/orleans

v3.6.0

20 Jan 16:59
f66bff5
Compare
Choose a tag to compare

Breaking changes for Azure providers

Authentication to Azure services has evolved and connection strings are typically deemphasized in favor of managed identities. In this release, we've migrated Azure providers to newer libraries which support managed identities and other authentication methods and we've updated options to expose this functionality. Since there are a large number of potential authentication methods requiring various combinations of options, we moved away from exposing each property and instead added configuration methods to the provider options.

For example, AzureStorageOperationOptions has a ConfigureTableServiceClient method with the following overloads:

  • void ConfigureTableServiceClient(string connectionString)
  • void ConfigureTableServiceClient(Uri serviceUri)
  • void ConfigureTableServiceClient(Func<Task<TableServiceClient>> createClientCallback)
  • void ConfigureTableServiceClient(Uri serviceUri, TokenCredential tokenCredential)
  • void ConfigureTableServiceClient(Uri serviceUri, AzureSasCredential azureSasCredential)
  • void ConfigureTableServiceClient(Uri serviceUri, TableSharedKeyCredential sharedKeyCredential)

Configuration methods help to ensure that the right combination of parameters are provided.

Breaking issue with ASP.NET Core 2.1

Note, if you are using ASP.NET Core 2.1, please note that we have had reports of users running into an incompatibility issue in Microsoft.Extensions, preventing the upgrade. Please see dotnet/extensions#3800, dotnet/aspnetcore#40255, and #7608

Build system changes

  • Builds are now reproducible
  • SourceLink has been enabled and sources are no longer shipped in the NuGet packages
  • Debug symbols are now embedded in the shipped dlls instead of alongside them

Improvements and bug fixes since 3.5.1

  • Breaking and potentially breaking changes

    • Migrate to Azure.Data.Tables SDK for Azure Table Storage (#7300) (#7363)
    • Upgrade to .NET 6.0, update dependencies, and fix packaging (#7431)
  • Non-breaking improvements

    • Rewrite batch build script in powershell (#7379) (#7382)
    • Support Npgsql 6.0 for clustering and reminders (#7402)
    • Initial commit for distributed tests using crank (#7323) (#7440)
    • Support configuring ILBasedSerializer and BinaryFormatterISerializableSerializer as pre-fallback serializers (#7384)
    • Support opting in to IKeyedSerializer implementations on a per-type basis using an attribute (#7438)
    • Create PostgreSQL migrations for version 3.6.0 (#7490) (#7492)
  • Non-breaking bug fixes

    • Fix bug in validating generic constraints for method invocation (#7400)
    • Do not register IGrainStorage in DI unless there is a provider named "Default" (#7409) (#7444)
    • AdoNet - be more explicit when extracting DateTime from IDataRecord (#6968) (#7463)
    • DynamoDB Reminders load using GSI (#7437)

Thank you to the following community members for your PR contributions and to everyone else who contributed: @mjameson-se @dave-b-code @michaeltdaniels @EdeMeijer @shmurchi

Full Changelog: v3.5.1...v3.6.0

v3.5.1

08 Nov 17:18
928c259
Compare
Choose a tag to compare

Improvements and bug fixes since 3.5.0

  • Potentially-breaking changes

    • Support ILBasedSerializer as a fallback only after trying the configured fallback (#7355)
      • This change may expose cases where types in an application are no longer able to be serialized because they were being serialized by ILBasedSerializer. In 3.6.0, we are introducing an option to revert to the previous behavior (attempting to use ILBasedSerializer before the configured fallback serializer), so if this affects you, please wait for that release. If you encounter any issues, please open an issue.
  • Non-breaking improvements

    • Support Minimal API host builder (#7316)
    • CodeGenerator: support implicit using directives (#7310)
    • Improve graceful connection shutdown (#7345)
    • Upgrade Kubernetes version used by Microsoft.Orleans.Kubernetes.Hosting package (Fix: Orleans.Hosting.KubernetesHosting: System.MissingMethodException (#7364))
    • Implement GetGrainId() for SystemTargets & GrainService (#7259)
    • Make Silo.WaitForMessageToBeQueuedForOutbound configurable. (#7354) (#7355)
    • Clarify comment on disabling workload analysis and log unknown status updates at the debug level (#7358)
  • Non-breaking bug fixes

    • Fix: wrong parameter used when throwing QueueCacheMissException (#7295) (#7306)
    • Add support for generic constraint during resolution of generic grain methods (#7325)
    • Adjust ETag options on Clear/Write (#6908) (#7187) (#7326)
    • Record result of deserialization for keyed serializers (#7336)
    • Record deserialized array before deserializing array contents arrays of non-primitive types (#7335)
    • Fix in Redis GrainDirectory: when deleting, check only the ActivationId (#7362)
    • [Streaming] Fix token comparison in SetCursor (#7308)

v3.4.4

04 Oct 16:01
f2bae2e
Compare
Choose a tag to compare

This release is based on 3.4.3, but without some changes made in 3.5.0, noticeably the Event Hubs library upgrade.

If possible, users should upgrade to 3.5.0 instead.

Improvements and bug fixes since 3.4.3

  • Non-breaking bug fixes
    • Fix token comparison in SetCursor (#7314)
    • Fix: wrong parameter used when throwing QueueCacheMissException (#7295)
    • Fix shutdown race condition in GrainTimer (#7234)
    • More fix to avoid shutdown pulling agent sending messages (#7222)
    • In PooledQueueCache, avoid cache miss if we know that we didn't missed any event (#7060)

v3.5.0

03 Sep 13:42
7608c19
Compare
Choose a tag to compare

Improvements and bug fixes since 3.4.3

  • Non-breaking improvements

    • Add C# 9.0 record support to serializer (#7119)
    • Fallback to ILBasedSerializer when BinaryFormatter is disabled (#7198)
    • AAD TokenCredential support on Azure Event Hubs Stream Provider (#7166)
    • Add TLS certificate selector for the client (#7144)
    • Add MembershipVersion in GrainAddress (#7133)
    • Add Silo RoleName based placement (#7157)
    • Permit ValueTask and ValueTask<T> to be used in generic grain methods (#7190)
    • Do not add application parts from dependency context by default (#7197)
    • Updated AzureBlobStorage to use state type during JSON deserialization (#7147) (#7212)
    • Upgrade Azure EventHubs (#7255)
  • Non-breaking bug fixes

    • In PooledQueueCache, avoid cache miss if we know that we didn't miss any event (#7060)
    • GenericMethodInvoker now correctly handles generic methods with overloads (#6844)
    • Fix unhandled exception on Kubernetes watcher background task (#7168)
    • EventHubReceiverProxy: ignore if offset isn't valid (#7192)
    • Fix to avoid stopped stream pulling agents from sending messages (#7222)
    • Fix shutdown race condition in GrainTimer (#7234)

v3.4.3

24 Aug 14:28
8255e5a
Compare
Choose a tag to compare

Improvements and bug fixes since 3.4.2

  • Non-breaking bug fixes
    • Prevent diagnostic notification messages from being dropped prematurely (#7046) (#7049)

v3.4.2

05 Apr 17:54
97062ac
Compare
Choose a tag to compare

Improvements and bug fixes since 3.4.1

  • Non-breaking improvements

    • Close connections asynchronously and improve gracefulness (#7006)
  • Non-breaking bug fixes

    • Only send a rejection if the message is a request (#6946) (#6958)
    • Avoid Message.Id collision in the message callback dictionary (#6945) (#6959)
    • Check if offset is a long in EventHubCheckpointer (#6960)
    • Return zero elapsed time for unstarted Messages (#6969) (#6973)
    • Call ScheduleCollection before starting processing requests (#6974)
    • Fix breaking of outstanding messages when a silo shuts down (#6977) (#6978)
    • Fix retry logic in PersistentStreamPullingAgent.RunConsumerCursor (#6983)

v3.4.1

03 Feb 20:43
5a87bbb
Compare
Choose a tag to compare

Kubernetes hosting package marked as stable

The Microsoft.Orleans.Kubernetes.Hosting package is now marked as stable. This package is intended to help users who are deploying to Kubernetes by automating configuration of silos, monitoring Kubernetes for changes in the active pods, and terminating pods which are marked as defunct by the Orleans cluster. Please try it and give us your feedback. Documentation is available here and a sample project is available here.

Improvements and bug fixes since 3.4.0

  • Non-breaking improvements

    • Improve performance of LRU.Add() (#6872)
    • Added a base class to IStorage that is not generic (#6928) (#6931)
    • Cleanup Kubernetes hosting package for stable release (#6902) (#6911)
    • Mark Kubernetes hosting package as stable (#6804) (#6903)
  • Non-breaking bug fixes

    • Fix leak in RunningRequestSenders (#6903)
    • Avoid disposing uncompleted task in LinuxEnvironmentStatistics (#6842) (#6887)
    • Only log that a message is being forwarded if it is being forwarded (#6892) (#6910)
    • In GrainDirectoryPartition, throw an exception instead of returning null if trying to register an activation on a non-valid silo (#6896) (#6901)
    • Do not retry to send streaming events if the pulling agent has been stopped (#6897) (#6900)
    • Try to limit forwarding when a grain activation throws an exception in OnActivateAsync() (#6891) (#6893)

v3.4.0

06 Jan 20:27
399dd4e
Compare
Choose a tag to compare

Improved resiliency during severe performance degradation

This release includes improvements to the cluster membership algorithm which are opt-in in this initial release. These changes are aimed at improving the accuracy of cluster membership when some or all nodes are in a degraded state. Details follow.

Perform self-health checks before suspecting other nodes (#6745)

This PR implements some of the ideas from Lifeguard (paper, talk, blog) which can help during times of catastrophe, where a large portion of a cluster is in a state of partial failure. One cause for these kinds of partial failures is large scale thread pool starvation, which can cause a node to run slowly enough to not process messages in a timely manner. Slow nodes can therefore suspect healthy nodes simply because the slow node is not able to process the healthy node's timely response. If a sufficiently proportion of nodes in a cluster are slow (eg, due to an application bug), then healthy nodes may have trouble joining and remaining in the cluster, since the slow nodes can evict them. In this scenario, slow nodes will also be evicting each other. The intention is to improve cluster stability in these scenarios.

This PR introduces LocalSiloHealthMonitor which uses heuristics to score the local silo's health. A low score (0) represents a healthy node and a high score (1 to 8) represents an unhealthy node.

LocalSiloHealthMonitor implements the following heuristics:

  • Check that this silos is marked as Active in membership
  • Check that no other silo suspects this silo
  • Check for recently received successful ping responses
  • Check for recently received ping requests
  • Check that the .NET Thread Pool is able to execute work items within 1 second from enqueue time
  • Check that local async timers have been firing on-time (within 3 seconds of their due time)

Failing heuristics contribute to increased probe timeouts, which has two effects:

  • Improves the chance of a successful probe to a healthy node
  • Increases the time taken for an unhealthy node to vote a healthy node dead, giving the cluster a larger chance of voting the unhealthy node dead first (Nodes marked as dead are pacified and cannot vote others)

This effects of this feature are disabled by default in this release, with only passive background monitoring being enabled. The extended probe timeouts feature can be enabled by setting ClusterMembershipOptions.ExtendProbeTimeoutDuringDegradation to true. The passive background monitoring period can be configured by changing ClusterMembershipOptions.LocalHealthDegradationMonitoringPeriod from its default value of 10 seconds.

Probe silos indirectly before submitting a vote (#6800)

This PR adds support for indirectly pinging silos before suspecting/declaring them dead.
When a silo is one missed probe away from being voted, the monitoring silo switches to indirect pings. In this mode, the silo picks a random other silo and sends it a request to probe the target silo. If that silo responds promptly with a negative acknowledgement (after waiting for a specified timeout), then the silo will be suspected/declared dead.

Additionally, when the vote limit to declare a silo dead is 2 silos, a negative acknowledgement counts for both required votes and the silos is unilaterally declared dead.

The feature is disabled by default in this release - only direct probes are used by-default - but could be enabled in a later release, or by users by setting ClusterMembershipOptions.EnableIndirectProbes to true.

Improvements and bug fixes since 3.3.0

  • Non-breaking improvements
    • Probe silos indirectly before submitting a vote (#6800) (#6839)
    • Perform self-health checks before suspecting other nodes (#6745) (#6836)
    • Add IManagementGrain.GetActivationAddress() (#6816) (#6827)
    • In GrainId.ToString(), display the grain type name and format the key properly (#6774)
    • Add ADO.NET Provider support MySqlConnector 0.x and 1.x. (#6831)
  • Non-breaking bug fixes
    • Avoid race for stateless worker grains with activation limit #6795 (#6796) (#6803)
    • Fix bad merge of GrainInterfaceMap (#6767)
    • Make Activation Data AsyncDisposable (#6761)

v3.4.0 RC1

10 Dec 00:22
3f344a7
Compare
Choose a tag to compare
v3.4.0 RC1 Pre-release
Pre-release

Improved resiliency during severe performance degradation

This release includes improvements to the cluster membership algorithm which are opt-in in this initial release. These changes are aimed at improving the accuracy of cluster membership when some or all nodes are in a degraded state. Details follow.

Perform self-health checks before suspecting other nodes (#6745)

This PR implements some of the ideas from Lifeguard (paper, talk, blog) which can help during times of catastrophe, where a large portion of a cluster is in a state of partial failure. One cause for these kinds of partial failures is large scale thread pool starvation, which can cause a node to run slowly enough to not process messages in a timely manner. Slow nodes can therefore suspect healthy nodes simply because the slow node is not able to process the healthy node's timely response. If a sufficiently proportion of nodes in a cluster are slow (eg, due to an application bug), then healthy nodes may have trouble joining and remaining in the cluster, since the slow nodes can evict them. In this scenario, slow nodes will also be evicting each other. The intention is to improve cluster stability in these scenarios.

This PR introduces LocalSiloHealthMonitor which uses heuristics to score the local silo's health. A low score (0) represents a healthy node and a high score (1 to 8) represents an unhealthy node.

LocalSiloHealthMonitor implements the following heuristics:

  • Check that this silos is marked as Active in membership
  • Check that no other silo suspects this silo
  • Check for recently received successful ping responses
  • Check for recently received ping requests
  • Check that the .NET Thread Pool is able to execute work items within 1 second from enqueue time
  • Check that local async timers have been firing on-time (within 3 seconds of their due time)

Failing heuristics contribute to increased probe timeouts, which has two effects:

  • Improves the chance of a successful probe to a healthy node
  • Increases the time taken for an unhealthy node to vote a healthy node dead, giving the cluster a larger chance of voting the unhealthy node dead first (Nodes marked as dead are pacified and cannot vote others)

This effects of this feature are disabled by default in this release, with only passive background monitoring being enabled. The extended probe timeouts feature can be enabled by setting ClusterMembershipOptions.ExtendProbeTimeoutDuringDegradation to true. The passive background monitoring period can be configured by changing ClusterMembershipOptions.LocalHealthDegradationMonitoringPeriod from its default value of 10 seconds.

Probe silos indirectly before submitting a vote (#6800)

This PR adds support for indirectly pinging silos before suspecting/declaring them dead.
When a silo is one missed probe away from being voted, the monitoring silo switches to indirect pings. In this mode, the silo picks a random other silo and sends it a request to probe the target silo. If that silo responds promptly with a negative acknowledgement (after waiting for a specified timeout), then the silo will be suspected/declared dead.

Additionally, when the vote limit to declare a silo dead is 2 silos, a negative acknowledgement counts for both required votes and the silos is unilaterally declared dead.

The feature is disabled by default in this release - only direct probes are used by-default - but could be enabled in a later release, or by users by setting ClusterMembershipOptions.EnableIndirectProbes to true.

Improvements and bug fixes since 3.3.0

  • Non-breaking improvements
    • Probe silos indirectly before submitting a vote (#6800) (#6839)
    • Perform self-health checks before suspecting other nodes (#6745) (#6836)
    • Add IManagementGrain.GetActivationAddress() (#6816) (#6827)
    • In GrainId.ToString(), display the grain type name and format the key properly (#6774)
  • Non-breaking bug fixes
    • Avoid race for stateless worker grains with activation limit #6795 (#6796) (#6803)
    • Fix bad merge of GrainInterfaceMap (#6767)
    • Make Activation Data AsyncDisposable (#6761)

v3.3.0

09 Sep 20:34
baa1dc8
Compare
Choose a tag to compare

Improved diagnostics for long running, delayed, and blocked request:

This release includes improvements to give developers additional context when a request does not return promptly. PR #6672 added these improvements. Orleans will periodically probe active grains to inspect their message queues and send status updates for certain requests which have been enqueued or executing for too long. These status messages will appear as warnings in the logs and will also be included in exceptions when a request timeout occurs. The information included can help a developer to identify what the grain is doing at the time of the request. For example, which messages are enqueued ahead of this message, and which messages are executing, how long they have been executing, how long this message has been enqueued, and the status of the grain's TaskScheduler.

Microsoft.Orleans.Hosting.Kubernetes NuGet package (3.3.0-beta1) for tighter integration with Kubernetes

This release adds a new pre-release package, Microsoft.Orleans.Hosting.Kubernetes, which adds richer integration for users hosting on Kubernetes. The package assists users by monitoring Kubernetes for silo pods and reflecting changes in cluster membership. For example, when a Pod is deleted, it is immediately removed from Orleans' membership. In addition, the package configures EndpointOptions and `ClusterOptions' to match the Pod's environments. Documentation and a sample project are expected in the coming weeks, and in the meantime, please see the original PR for more information: #6707.

Improvements and bug fixes since 3.2.0.

  • Potentially breaking change

    • Added 'RecordExists' flag to perisistent store so that grains can det… (#6580)
      (Implementations of IStorage<TState> and IGrainState need to be updated to add a RecordExists property.)
  • Non-breaking improvements

    • Use "static" client observer to notify from the gateway when the silo is shutting down (#6613)
    • More graceful termination of network connections (#6557) (#6625)
    • Use TaskCompletionSource.RunContinuationsAsynchronously (#6573)
    • Observe discarded ping task results (#6577)
    • Constrain work done under a lock in BatchWorker (#6586)
    • Support deterministic builds with CodeGenerator (#6592)
    • Fix some xUnit test discovery issues (#6584)
    • Delete old Joining records as part of cleanup of defunct entries (#6601, #6624)
    • Propagate transaction exceptions in more cases (#6615)
    • SocketConnectionListener: allow address reuse (#6653)Improve ClusterClient disposal (#6583)
    • AAD authentication for Azure providers (blob, queue & table) (#6648)
    • Make delay after gw shutdown notification configurable (#6679)
    • Tweak shutdown completion signalling (#6685) (#6696)
    • Close some kinds of misbehaving connections during shutdown (#6684) (#6695)
    • Send status messages for long-running and blocked requests (#6672) (#6694)
    • Kubernetes hosting integration (#6707) (#6721)
    • Reduce log noise (#6705)
    • Upgrade AWS dependencies to their latest versions. (#6723)
  • Non-breaking bug fixes

    • Fix SequenceNumber for MemoryStream (#6622) (#6623)
    • When activation is stuck, make sure to unregister from the directory before forwarding messages (#6593)
    • Fix call pattern that throws. (#6626)
    • Avoid NullReferenceException in Message.TargetAddress (#6635)
    • Fix unobserved ArgumentOutOfRangeException from Task.Delay (#6640)
    • Fix bad merge (#6656)
    • Avoid race in GatewaySender.Send (#6655)
    • Ensure that only one instance of IncomingRequestMonitor is created (#6714)