-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reactive Streams Signal Processor #5
Comments
I don't fully understand the use cases here so can you please elaborate? What does the Reactive-IPC core need to do with this that the transport implementations wouldn't take care of. For example, if Netty is being used as the transport impl then it has its own eventloops and schedules work as needed and will perform callbacks that we would then model as Reactive Streams. |
If we don't have a way to dispatch signals to subscribers, then we can't create any reusable abstractions in core. I was trying to implement a simple If I have a single The |
Under what circumstances is this the concern of this layer of code? Why is this not just the transport layer worrying about this? The only purpose of what we're building is reactive APIs over IO, and IO implementations will always have their own approaches to scheduling (Netty uses eventloops, Servlet 3.1 uses native threads), and they will all do their callbacks as they wish. Where and why do we introduce further dispatching or scheduling?
Why would this ever be happening? If I perform IO on Netty it will always come back synchronously. Nothing at this level would merge streams.
What does this have to do with IO?
Why would it be in the RingBuffer thread? And even then I still see it working. For example, Netty would correctly schedule it onto the right event loop if you weren't already on that event loop. Please walk me through the use case you are considering, as I'm not understanding it. |
In core.
Because I can't reuse almost anything we have in Reactor without the concept of dispatching. But as @smaldini has already suggested to me, we might just do this in Reactor and not rely on the ripc-core for any foundational abstractions.
Why not? |
Amend me if I'm wrong @jbrisbin. I think what is being suggested is some sort of base abstraction around dispatching signals, the base machinery to get RS working (which is trivial for publisher, but not for processors, subscribers and subscription). The bonus would be to deal with a rather important constraint: a reactive stream off any of our IO impl (like Netty) will often need to jump to another thread as fast as it can, to avoid blocking things like say the event loop in Netty.
|
In theory the difficulty will come with the Output abstraction (server reply or client request). We need to listen for a publisher and so create our own RIPC Subscriber to this effect. @jbrisbin mentioned merging because in our HTTP impl for instance, we apply backpressure across multi matching URL endpoint and use Streams.merge() to make sure we manage the write in a single fan in endpoint before flushing. I think we can live /wo that for now. |
I think we are missing a point here (I think @smaldini was referring to this, but I will still put it across): None of the applications would be using reactive-ipc-nettyX module directly as dealing with RS interfaces directly is like dealing with callbacks. They would be using a stream abstraction like RxJava, Reactor Streams, etc. So, IMHO, we should not reinvent what either a networking library (netty) or a stream implementation (RxJava) already provides us. In this context, dispatching semantics overlap with eventloops, jumping threads overlaps with I do not see a reason when I would use a custom dispatcher directly with netty's eventloops. The usecase about switching threads for any blocking operations, idiomatically should leverage RxJava
I actually am not able to understand the non-trivial nature of subscriber/subscription implementation. I see a few points mentioned here:
RS processors are |
I'm not suggesting we reinvent anything; this is a codification of existing functionality and not the introduction of a new concept. Any generic abstraction that lives in ripc-core has no foreknowledge of the tasks being run inside a delegate At the user level this is for the purpose of intentionally switching threads based on the developer's foreknowledge of the processing pipeline being created. Generic components have no such foreknowledge but still have a need to send signals to a In practical terms, my local experiments show this has meant only two abstractions are required: a |
As an example of the importance of this concept of dispatching signals, consider that even this most basic of sketches violates the spec because it provides no protection against concurrent |
@NiteshKant just a quick amend, Processors are not entirely == Subjects. As soon as you subscribe it to your publisher it has to fulfill the demand contract, I'm fighting with that currently on one of our scheduler to make sure that is respected but anyway yes its not a target for here, we will leave the IO thread ownership unless additional layer make sure that is easy to change. There are 44 tests at leasts to pass plus 22 rules untested and 30% of them are really sometimes hard to implements (talking about IdentityProcessorTest from RS TCK). |
Just to clarify: this isn't about providing a way to intentionally switch threads. This is entirely about publishing RS signals to subscribers from generic components in an orderly fashion in a way that doesn't violate the spec. I point to my example above. The |
This point about ZeroMQ that was just brought up on reactive-streams-io issue 1 IMO serves to illustrate that abstractions other than those directly dependent on Netty will need to do async message passing. ZeroMQ has no facility like Netty to schedule tasks asynchronously. It makes sense that the core interactions of Reactive IPC be abstracted into a module not dependent on Netty which could also be used from a ZeroMQ implementation. |
Why? If computation is being done like 'map' then switching threads is unnecessary and actually worse. Computation moves only as fast as the available CPUs. If there are 8 cores and 8 event loops, moving work to another thread is not speeding anything up. I agree with @smaldini on the following:
The IO layer should do the simplest posssible thing – call back via a
Can you please provide more information on this? I don't understand why streams need to be merged. I have only ever seen the need to merge streams as a user defined semantic for application behavior.
I agree with @NiteshKant on this point.
Jon, I don't understand the use case you're trying to solve. Netty will not give me concurrent |
By the way, this was said earlier:
That is not a reason for us to make or retain a design decision. |
@benjchristensen this isn't about a single channel, but funneling events from multiple channels into a single
Fair enough. That's why I put forward other reasons for suggesting this functionality live in core. I remain unconvinced that building a kernel that calls itself Reactive can do any meaningful work without providing for asynchronous message passing. |
I wanted to say that its not always the case rather than often maybe. E.G we have customers using ingesting box with more than 16 CPU and 4 on long-living connections (corresponding to 4 feeding clients). These clients must avoid blocking as much as they can and processing in this case is enough non trivial to be more expansive than placing an event in a queue (actually a ring buffer). The other usual use case I've seen is just acknowledgement where data is moved as fast as possible to a queue, possibly durable/ha, and keep reading (so not filling the event loop). Norman gives that advise too in general. About the HTTP merge, in reactor you can register multiple http endpoints matching a single request using wildcards or group captures. You can also group/route request by IP, HTTP method etc. These will coordinate to write to a single connection back. The model we use is that a connection is bound to a Stream which is a single thread. Each of these http routes may run on different threads at some point (async callbacks) and we ensure these will write back through the thread assigned to the Stream which maps the user netty channel. We ensure ordered processing at least (order of declaration) for each route so you can build interceptors if you want. |
@smaldini @jbrisbin @benjchristensen I have added a comment in the goals & motivation issue to discuss what should belong in reactive-ipc-core and how much of an overarching abstraction across network libraries should we create. I would like to discuss that first before we start discussing the need of a buffer or a dispatcher. |
I actually disagree that this would be the case in many circumstances of concern at the network layer. As @benjchristensen pointed, it is mostly the case in the user application layer where someone would merge streams from various channels. |
I think this is where the clarification around what that kernel contains is important, which is good to discuss under goals & motivations |
@NiteshKant I can't see any other safe but efficient way to send RS signals to the composition library than by providing a facility for asynchronous signal (message) passing. UPDATE: Maybe having two conversations isn't a good idea. I have no idea on which issue this paragraph belongs: The entire premise of a Reactive system is that actions that happen in one component are isolated from the others via asynchronous message passing. How can we accomplish this isolation without dispatching signals? If we must dispatch signals at the transport implementation layer, then how is that accomplished? If the only way to dispatch asynchronous RS signals is by using an abstraction that has a hard reference to Netty's EventLoops, then what happens when we want to do the same thing for ZeroMQ or XNIO? |
This seems at the crux of this debate. Being "reactive" does not mean every Being reactive means something is pushed to us so we react to it. Period. This is the dual of being interactive, where we pull ( If a user wishes for something to be async, then the composition layer offers them tools to make something async. If they want to move across threads, they can do so. If they want to join or merge across multiple threads, they can. Here are examples ... Most basic form of an IO event being pushed: incomingTcpStream.subscribe(t -> {
// receive `t` on whatever thread the transport IO invoked `onNext`
}); In this case there is no further need for dispatch or scheduling. This next case merges 2 streams: incomingTcpStreamA = ...
incomingTcpStreamB = ...
combinedStream = merge(incomingTcpStreamA, incomingTcpStreamB).subscribe(t -> {
// receive `t` on either thread A or B (of course sequentially as per RS spec)
}); In this case it is a combinatorial operator that will merge the events being pushed at it. Since merging can have concurrency it is now the responsibility to ensure the Reactive Streams spec and ensure sequential emission. It is up to the Now if a user wants to move from one thread to another they can again choose to compose as follows: incomingTcpStream
.observeOn(someOtherThread)
.subscribe(t -> {
// receive `t` on someOtherThread as defined in observeOn rather than the IO thread
}); Behaviors can be combined like this: incomingTcpStreamA = ...
incomingTcpStreamB = ...
combinedStream = merge(incomingTcpStreamA, incomingTcpStreamB)
.observeOn(someOtherThread)
.subscribe(t -> {
// receive `t` on someOtherThread as defined in observeOn rather than the IO thread
}); Now we are merging IO events from potentially 2 threads and moving the output to a 3rd thread. This is all reactive with functional composition for a user to take control of asynchrony and scheduling to meet their needs. The IO layer just calls If an application wants an event-bus then they should use an event-bus, such as Reactor. The IO layer should have as few opinions as possible and be as simple as possible and not make scheduling decisions on behalf of the user. Note how in the examples above the IO responsibility did not include any of the dispatch or combinatorial use case. That all lives in the layer above that composes IO Reactive Streams exposes as |
I agree with that definition from @benjchristensen just noting that yes Reactive is not required to be async (besides reactive manifesto, onNext/onComplete still abstract the location/resource away), but at least offers for error isolation (onError). We will find other mandatory abstract objects anyway but I propose to close that issue for now and defer async decision to end-user or end-implementation. |
This is the heart of the matter and where we'll have to agree to disagree. This necessarily limits the kinds of abstractions we can provide as building blocks, but as long as that's intentional and we recognize that, then I don't have a problem coming down on the side of simplicity. |
What use cases are we preventing? And how would making everything async not cause performance and complexity impact for the majority of cases? |
I probably shouldn't have banged the async hammer so hard because it does seem I'm advocating for "enforced" asynchrony and that's certainly not the case. The key word I think was that only having synchronous invocation was limiting. I apologize for focusing too much on that aspect. We want to keep things fast and dispatch when we have to. But we've found with the RingBuffer you can do both efficiently. As to what we're preventing: some things I can think of and some I don't even know what their needs are going to be. In effect, we seem to be bringing the scope of what an IO kernel actually is down to such a simplistic level that if you were to, say, choose a transport module but not a composition library, you'd have nothing useful that you didn't bring along with you to enrich the functionality of the kernel. Going back to vert.x, this model has proven to be quite useful and while there are some design decisions there I'm not too excited about, it's a system that does just what is needed: it allows non-blocking components to interact with the core (they do it over the event bus rather than through Reactive Streams) and provide the core needed functionality without forcing the verticle to work in isolation of all other verticles running in the same app. So far it seems what we're actually building here is just a very inconsequential API conversion from Netty to Reactive Streams. The usefulness of Reactive Streams is not in the fact that it provides three standard interfaces that don't exist in the JDK (otherwise we'd just use the ones from there). Its usefulness is in the spec which lays out the Reactive pattern in specifics and provides the TCK to check it. It provides error and (optional) resource isolation and a whole host of other things that intermediate components could benefit from. My idea of this IO kernel when we originally talked about it was much closer to what vert.x has, but made fully-Reactive and reusable: a core kernel that has unlimited potential for functionality because the core has extension modules plugged into it that are exposed exclusively via So far we have proposed a very simplistic API conversion from Netty -> RIPC and then from RIPC -> RxJava. There's a lot that could happen in between those two but I don't see anything currently that would facilitate that. |
Looking at the Reactive Streams spec I see nothing to suggest that automatated dispatching is expected. On the contrary there is plenty to suggest the opposite that publishers and subscribers make deliberate choices about dispatching rather than have that choice made for them. For example in 2.2 a subscriber is asynchronous if it suspects its processing would impact the producer, not by default. The example code further confirms this by showing both synchronous and asynchronous subscribers. If there is any doubt left, the section on Asynchronous vs Synchronous Processing clearly shows that async boundaries can and are expected to vary and that implies choice. So I don't see anything inherently wrong with intentionally scheduling work based on the expected situation. In fact intuitively this makes sense. For example for the IO kernel we can fully expect runtimes to be non-blocking so we won't be exposed to synchronous open recursion through Subscription.request. I think we can agree on that much and I see no such issues when looking at ChannelInboundHandlerSubscription. For the open recursion I think Jon was referring to ChannelInitializerSubscription which is for subscribers of NettyTcpServerConnection. Here Jon does have a point. However I don't actually agree with this part of the current code. I see no reason for connections to be provided through a reactive streams Publisher and this is further supported by another of Jon's concerns about a singleton subscriber receiving illegal concurrent onNext signals for multiple incoming connections. Again why do we need this to be a Publisher? Instead I think it would be sufficient to model that as some sort of a ProtocolHandler along the lines of:
Such a ProtocolHandler would indeed be a singleton (we only need one per protocol) and it would be mainly responsible for setting up protocol specific handling so it's non-blocking by definition. In RxNetty terms this is the ConnectionHandler. |
@jbrisbin and I spoke on Friday about use cases. I have my notes and once I have some free time will document our conversation and the requirements. The key point of our conversation was that we need the core to be extensible, and I completely agree, but as we discussed it should not require the core predefining how that happens (such as via an event bus, but permit extension and injection of behavior). More to come later... |
Reactor 2.0 will introduce a new implementation of Reactive Streams signal dispatching that uses the Reactive Streams
Processor
to schedule work to be done in form of sending an RS signal to aSubscriber
asynchronously.It's easy enough to simply invoke one of the
Subscriber
methods directly in the calling thread, but to do "proper" (according to the spec) asynchronous signal publication efficiently we should provide aProcessor<Signal>
(aSignal
consisting of a type, a payload, and theSubscriber
) which allows publishers to send signals to subscribers asynchronously, safely, and in an efficient (and ordered) fashion.We call these components "dispatchers" in Reactor and they are key to the efficient publication of signals. I think we want the same capability here. Whether that publication involves jumping from one thread to another would be determined by the implementation (e.g. with a Disruptor
RingBuffer
).Note that this is, IMO, not the same thing as intentionally "scheduling" work to be done on another thread based on foreknowledge of the work being done (like submitting blocking IO to a work queue). We could add a delay to the publication of a signal, which we would need at the minimum to provide abstract timeout capabilities (i.e. not tied to a particular action, like a write). Being able to schedule signals in the future is simply an extension of being able to schedule signals at all. This would be similar to the event loop functionality of "next tick" or "trampolining" signals, which is important when dispatching signals recursively.
The text was updated successfully, but these errors were encountered: