-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CommunicationInstance: System.IO.IOException: Remote message handler throws an exception. #326
Comments
More data and information for research System.IO.IOException |
I'm uploading the trinity.log file |
So I am able to duplicate the problem on my local SF Cluster and Azure SF Cluster; it looks like we aren't getting a connection to the remote IClientRegistry (memory cloud). So we are getting a connection and we can send data but unable to receive data. |
Okay - I've travel way down the GE rabbit hole now and have an open ticket with Microsoft Service Fabric Support. We are making progress folks. So to actually connect a GE Client to a GE Server running an Azure Service Fabric cluster is a matter of configuring the Azure LB at Level 4 to let TCP traffic pass through; once that done properly the GE GraphEgnineClient API-set is able to partially connect to the GE Service instance running in the SF Cluster. What I have come to understand and to appreciate the brilliance of the Graph Engine networking stack, and that connecting to the server is a multi-step process and that the GE is dogged w.r.t. keep that connection in place. Here's all you need to do this point the GE Client to your SF Cluster: FYI: I've been documenting the ins-n-outs of developing with the GE in the SF and will publish them at my GitHub GraphEngine repository soon. This is what I found most recently. Processing on the GE Client-side will make this call into the GE Server The GE Client is setting up Client Response registration with Server and will get ready to start polling the server before each RPC call as well as lower-level Graph Engine Network infrastructure RPCs into the server; the GE is truly type-safe distributed across memorycloud instances, even in the SF-Cluster. The call, however, fails on the GE Server-side when running in an SF Cluster; otherwise, the stuff just works. This is bad and as a result, a true or complete connection is never made; the GE Client in the means time is re-trying the Polling and of course that fails too. I've got another remote debugging session schedule with Mike Wong from MS SF Support; this guy is great! I think we are getting down the root-cause of this thing and then a fix can be applied. |
Okay, so when the GE Client connects (TrinityClient.Start()) to my GE Server, outside of the SF Cluster, the RegisterClientHandler is firstly called before CheckInstanceCookie on the server-side; this call sequence is very important because the RegisterClientHandler is the only method that adds the client cookie to m_client_storages:
When the GE server is running in the SF Cluster the RegisterClientHandler is not called first, instead, the CheckInstanceCookie is called first:
and the m_client_storages is empty. So even though the GE Client has made the initial connection to the server the order of method calls on the server-side seem to out of order; this sounds like something might be off in the lower-level code like the MessageDispatcher and DispatcherProc code. |
After working with the SF team for a few weeks there is a certain deficiency in the Graph Engine TCP/IP stack; I've been able to narrow it down to code in the TCP layer. I'll come back here to update our continued development and testing we perfect the GE/SF integration, specifically, external GE Client connection to SF/GE Cluster. |
GraphEngine Client crash when attempting to connect to Graph Engine Server running inside a Service Fabric Cluster
Here is the offending source code:
At line 28 in the source, we are just trying to connect TCP endpoint in the cluster; we can reach the GE service listening on the Exposed SF Listener. Looks like it can connect but the custom IMessagePassingEndpoint seems to fall-down when trying to send/receive data.
The text was updated successfully, but these errors were encountered: