Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/tcp: introduce TCP_NO_CONNECT flag #10534

Draft
wants to merge 1 commit into
base: v1.22.x
Choose a base branch
from

Conversation

ooststep
Copy link
Contributor

@ooststep ooststep commented Nov 13, 2024

There are some specific use cases where we may not want one side of communication to initiate connections, namely when we know that one side of our configuration is being heavily restricted by a firewall. To prevent indefinite hangs with certain operations, such as RMA reads and writes, introduce a provider specific flag to trigger an error if there is not already an established connection. In this case, the application can force the connection from the other direction.

prov/tcp/src/xnet.h Outdated Show resolved Hide resolved
Copy link
Contributor

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to the man page as well

prov/tcp/src/xnet_rdm_cm.c Outdated Show resolved Hide resolved
prov/tcp/src/xnet_attr.c Outdated Show resolved Hide resolved
prov/tcp/src/xnet_attr.c Outdated Show resolved Hide resolved
@ooststep ooststep force-pushed the tcp-no-connect branch 2 times, most recently from 1ed132e to d447e69 Compare November 14, 2024 14:09
include/rdma/fi_ext.h Outdated Show resolved Hide resolved
man/fi_tcp.7.md Outdated Show resolved Hide resolved
man/fi_tcp.7.md Outdated Show resolved Hide resolved
man/fi_tcp.7.md Outdated Show resolved Hide resolved
@@ -98,6 +98,9 @@ typedef void xnet_profile_t;
#define XNET_MIN_MULTI_RECV 16384
#define XNET_PORT_MAX_RANGE (USHRT_MAX)

/* provider specific op flags */
#define TCP_NO_CONNECT FI_TCP_NO_CONNECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use FI_TCP_NO_CONNECT directly?

Copy link
Contributor Author

@ooststep ooststep Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider specific flags are typically defined within the provider header files without any FI prefixing and used internally. I wanted to have a definition here as a placeholder so that it's obvious there are existing provider op flags that need to be accounted for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ooststep This is true when the API flag differs from the provider flag, or the scope of the flags are different. E.g. the API flags cover a 64-bit range, but the provider flag maps to some 16-bit range in the protocol.

There are some specific use cases where we may not want one side
of communication to initiate connections, namely when we know that
one side of our configuration is being heavily restricted by a
firewall.  To prevent indefinite hangs with certain operations,
such as RMA reads and writes, introduce a provider specific
flag to trigger an error if there is not already an established
connection.  In this case, the application can force the connection
from the other direction.

Signed-off-by: Stephen Oost <[email protected]>
*FI_TCP_NO_CONNECT*
: This flag indicates that operations should fail if there is no
existing connection to the remote peer. In such case, an FI_ENOTCONN
error should be expected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not make this a flag that's checked on every operation (i.e. it's okay to connect if it's a send, but not a write?). It makes more sense either applied to the entire rdm endpoint or to a specific peer.

Copy link
Contributor Author

@ooststep ooststep Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the target problematic scenario, the rdm is used for some peers where this flag is needed and some peers where it would not be, so applying to the entire rdm was undesirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense. An RDM endpoint is unconnected. Exposing low-level connection implementation details is not desirable. There could be multiple connections to the same peer. The connection might be in another process (e.g. Pony Express or SNAP or whatever it's called). It could be in the kernel (RDS).

This still isn't a per operation flag. At best it's per peer, but even that use case seems questionable. It's like putting half of an RDM endpoint behind a firewall, but the other half ignores it. Apply firewall semantics to the entire RDM endpoint. If some peers are outside the firewall, but some are inside, require 2 endpoints with some sort of per EP configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I also agree with the first part (ie. exposing low-level connection is a bad idea) though providing 2 types of endpoints is also difficult for the user. To provide some context as to what problem we were trying to solve, in a client-server configuration where the client is behind a firewall, we have cases where for example client A would send a message to server A, and server A would then attempt to do an emulated tcp RMA to client B through an fi_write() call (without prior connection established from client B to server A). In that case, it seems that with the tcp provider server A remains stuck attempting to establish a connection and doing an fi_cancel on the RMA does not seem to be able to complete, as it's not supported currently by the tcp provider. So what we wanted was a way of having some completion and error being returned when server A is not able to reach client B. I'm opened to other means but having to manage 2 endpoints seems also cumbersome, I would also be happy though if we don't have to expose any connection logic.

Copy link

@jolivier23 jolivier23 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shefty Just to add a bit of context beyond what Jerome said, this use case came from Parallelstore (Google's DAOS service). We control only the server side of the equation and rely on the user to do client configuration (as we don't control their VMs, network configuration, and processes). Opening the firewall to server to client connections is not common for services and requires them to do it explicitly or their writes will simply hang. We want to remove this requirement as it is becoming a major scale issue for onboarding new customers because users don't read documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the alternative, I would only configure the EPs at the server, as that's where the actual problem occurs. The client behavior is unchanged.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is really on the client. The client endpoint is the one behind the firewall and can't be reached (unless the server has already connected to it). Can we encode something into the client side URI that would indicate it can't be reached? The advantage to that approach from our end is that then the client could have complete control over telling the server whether or not it can handle the error (e.g. can support getting back an error indicating that the server couldn't connect and handle it appropriately)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the server's behavior that should change.

From the viewpoint of the client, everything works. It wants to talk to server A and can do so. That server A wants to pass off the response to some other system that the client doesn't know about is related to the storage architecture, not the client SW.

I don't think pushing this detail into the apps is the best option. But you can work-around this in the client by having the client send some sort of 'hello' message to every storage server during initialization -- to poke holes through the firewall. That pushes the burden onto every client app that might want to use DAOS.

Alternatively, you can configure the server SW to be firewall aware, so that it avoids forwarding requests to servers not already communicating with the client.

Or, change the protocol around handling firewalls. Have server A tell the client to retry its request with server B, rather than forwarding it internally.

There are likely other options for this. But I would avoid picking one which encoded these details in the SW API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better if this thread were copied into an issue for continued discussion, rather than attaching it to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -98,6 +98,9 @@ typedef void xnet_profile_t;
#define XNET_MIN_MULTI_RECV 16384
#define XNET_PORT_MAX_RANGE (USHRT_MAX)

/* provider specific op flags */
#define TCP_NO_CONNECT FI_TCP_NO_CONNECT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ooststep This is true when the API flag differs from the provider flag, or the scope of the flags are different. E.g. the API flags cover a 64-bit range, but the provider flag maps to some 16-bit range in the protocol.

nikhilnanal added a commit to nikhilnanal/libfabric that referenced this pull request Dec 18, 2024
This commit has been created to incorportate changes in the
draft PR-10534 in order to write a fabtest to test the changes.
ref: ofiwg#10534

Original Author : Stephen Oost <[email protected]>

Signed-off-by: Nikhil Nanal <nikhil.nanal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants