-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding the right Batching Mechanism #25
Comments
As you mentioned, only the Opting for any alternative method forces the gateway to be aware about which GraphQL servers support this method and it may become challenging. IMO We need to make sure it's compliant with GraphQL spec (not graphql-over-http spec). |
*many frameworks do support |
Variable batching gets my vote; definitely feels the most GraphQL-y; and for people who use DataLoaders on |
Variable batching also seems like the best option if we know that we can always execute the same operation and we are only ever selecting one entity but changing the id. Wouldn't we want to though also support selecting different entities from one subgraph fetch? If we have to resolve 3 [
{
"query": "query getFoo($a: Int!) { foo(a: $a) }",
"variables": {
"a": 1
}
},
{
"query": "query getFoo($a: Int!) { foo(a: $a) }",
"variables": {
"a": 2
}
},
{
"query": "query getBar($b: Int!) { bar(b: $b) }",
"variables": {
"b": 1
}
},
] VS {
"query": "query getFooAndBar($a: Int, b: Int) { foo(a: $a) bar(b: $b) }",
"variables": [
{ "a": 1 }, { "a": 2 }, { "b": 1 },
]
}, We could also add the option of request AND variable batching [
{
"query": "query getFoo($a: Int!) { foo(a: $a) }",
"variables": [
{"a": 1 }, {"a": 2 }
]
},
{
"query": "query getBar($b: Int!) { bar(b: $b) }",
"variables": {
"b": 1
}
},
] |
My take on batching non-unique operation bodies. When multiple operations are batched together, there is a risk. Fetching 'Bar' consumes 2s, while 'Foo' is optimized and only takes 90ms. While this is true for all types of batching, enforcing a single operation body helps minimize the impact, as the execution flow is likely to be consistent for all variables. We could stream the response of each execution, but I don't think it will improve performance of a gateway, as every batched operation is most likely required to resolve first, before resolving the next step of a query planner. |
I understand that this has much wider implications, but just to put it out there: the source Source Schema 1 could also provide a root field which can be queried likes this: query($ids: [ID!]! ,$requirements: [ProductDimensionInput!]) {
ordersDimensions(ids: $ids, dimensions: $requirements)
} This would not require any special batching mechanism at all. |
…tes (#5097) Two main things that we're doing in this PR. 1. We've added a variable to FetchNode called `context_rewrites`. This is a vector of DataRewrite::KeyRenamer that are specifically taking data from their path (which will be relative and can traverse up the data path) and writes the data into an argument that is passed to the selection set. 2. There are two cases. In the most straightforward, the data that is passed to the selection set is the same for every entity. This case is pretty easy and doesn't require any special handling. In the second case, the value of the variable may be different per entity. If that is true, we need to use aliasing and duplication in our query in order to send it to subgraphs. Once graphql/composite-schemas-spec#25 is decided and has subgraph support, this query cloning will be able to go away. Co-authored-by: o0Ignition0o <[email protected]> Co-authored-by: Gary Pennington <[email protected]>
…tes (#5097) Two main things that we're doing in this PR. 1. We've added a variable to FetchNode called `context_rewrites`. This is a vector of DataRewrite::KeyRenamer that are specifically taking data from their path (which will be relative and can traverse up the data path) and writes the data into an argument that is passed to the selection set. 2. There are two cases. In the most straightforward, the data that is passed to the selection set is the same for every entity. This case is pretty easy and doesn't require any special handling. In the second case, the value of the variable may be different per entity. If that is true, we need to use aliasing and duplication in our query in order to send it to subgraphs. Once graphql/composite-schemas-spec#25 is decided and has subgraph support, this query cloning will be able to go away. Co-authored-by: o0Ignition0o <[email protected]> Co-authored-by: Gary Pennington <[email protected]>
Was @andimarek's suggestion discussed? Introducing other modes of entity resolution seems to diminish the static and explicit benefits of Tooling and validation can help identify missing resolvers and generate them if needed (haven't though it through entirely, is it possible?). |
Batching Mechanisms for Distributed Executors
To implement efficient distributed executors for composite schemas, we need robust batching mechanisms. While introducing explicit batching fields for fetching entities by keys is a straightforward approach, it becomes challenging when entities have data dependencies on other schemas.
Consider the following GraphQL schema:
The issue arises with directives like @require for lower-level fields, where simple batching is insufficient for data dependencies.
Example Scenario:
Source Schema 1:
Source Schema 2:
In distributed executor queries, batching individual requirements for each key becomes problematic:
Apollo Federation's _entities field introduces a workaround, allowing partial data representation without the need for untyped inputs. While effective, an ideal solution would avoid necessitating subgraphs to introduce special fields like
_entities
.The
_entities
field allows to pass in data that represents partial data of an object. This works around how GraphQL works and introduces untyped inputs. Ideally we want to find a way for batching requests that do not require a subgraph to introduce a field like_entities
.Batching Approaches
The GraphQL ecosystem has devised various batching approaches, each with its own set of advantages and drawbacks.
Request Batching
Request Batching is the most straightforward approach, where multiple GraphQL requests are sent in a single HTTP request. This method is widely adopted due to its simplicity and compatibility with many GraphQL servers. However, the lack of semantical relation between the batched requests limits optimization opportunities, as each request is executed in isolation. This could result in inefficiencies, especially when there are potential overlaps in the data required by each request.
Pros:
Cons:
Operation Batching
Operation Batching, as shown by Lee Byron in 2016, leverages the
@export
directive to flow data between operations within a single HTTP request. This approach introduces the ability to use the result of one operation as input for another, enhancing flexibility and enabling more complex data fetching strategies. The downside is the complexity of implementation and the fact that it’s not widely adopted, which may limit its practicality for some projects. Additionally, it does not really target our problem space.Pros:
Cons:
Variable Batching
Variable Batching addresses a specific batching use case by allowing a single request to carry multiple sets of variables, potentially enabling more optimized execution paths through the executor. In experimentations we could reduce the batching overhead to the impact a DataLoder has on a request, which is promising.
Pros:
Cons:
Alias Batching
Alias Batching uses field aliases to request multiple resources within a single GraphQL document, making it possible with every spec-compliant GraphQL server. This method’s strength lies in its compatibility and ease of use. However, it significantly hinders optimization because each GraphQL request is essentially a unique request, preventing effective caching strategies (validation, parsing, query planing). While it might solve the immediate problem of batching requests, its impact on performance and scalability makes it not ideal.
Pros:
Cons:
The text was updated successfully, but these errors were encountered: