Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Query batching support #800

Closed
jakubriedl opened this issue May 10, 2020 · 11 comments
Closed

RFC: Query batching support #800

jakubriedl opened this issue May 10, 2020 · 11 comments
Labels
future 🔮 An enhancement or feature proposal that will be addressed after the next release

Comments

@jakubriedl
Copy link
Contributor

Summary

Apollo client support batching queries which is useful when many small components fetch at once. It reduces network overhaul and also allows server to avoid n+1 issue if implemented correctly.

Proposed Solution

It could be new exchange that would batch queries together, possibly using Dataloader.

Requirements

The behaviour should be similar to Apollo, allow enable batching on query level and should work correctly with persisted queries (avoid using persisted queries when batched).

@jakubriedl jakubriedl added the future 🔮 An enhancement or feature proposal that will be addressed after the next release label May 10, 2020
@kitten
Copy link
Member

kitten commented May 10, 2020

I personally believe that batching with Apollo's supported method is not a good option. Fundamentally, Apollo keeps writing additional extension specs, and this one in particular I believe goes against the spirit of GraphQL and isn't actually adding a lot of benefits.

  • With HTTP/2 there's little overhead in sending the additional request
  • It's common practice to hoist queries per page; given that then it's likely that on mount only 2-3 queries are active, those are often unrelated, and any Dataloader batching on the backend isn't going to weigh in a lot in terms of performance
  • On a CDN level, e.g. Cloudflare Workers / Other Edge Caching, persisted queries are already sufficient and again the overhead of multiple requests is low with HTTP/2

True batching would mean that we'd combine queries into a single GraphQL document and then separate the combined result from the API back into separate requests. In theory this could even be achieved on a Graphcache level or alternatively on a separate exchange.

This approach makes it clear that batching isn't that beneficial on the front end either. At that point we've either batched queries on a single tick into one, which will be mounting queries, which can be hoisted manually or don't have to be combined (see prior arguments for server-side) or they're unrelated queries if we batched on a timer, and batching them together has no noticeable speed-up effect, given a performant server setup, or can even decrease the responsiveness of the client-side load, e.g. a "fast auto-suggest search query" and a slow page query at the same time.

@jakubriedl
Copy link
Contributor Author

jakubriedl commented May 11, 2020

I agree with you that Apollo batching is quirky and I would advice everyone against using it as well you do. The points you have are completely true for 99% of cases. Our is bit weird and on bigger scale than usual though.

Imagine you have large page of movies (>200) which is publicly cached on CDN. But when that is loaded you want on every single them check if user can play it (eg. bought it) to show him indication about it. That information is not cacheable as it is private.

This amount of seperate request on single page load creates pressure on device, network, and the server. Device performance is very sensitive topic for us as we operate on emerging markets. About network I'm not so worried because of HTTP/2 but the biggest cost is server. Every query comes with fixed costs performance cost eg. Auth, Geolocation but also money as every request through CDN costs something and if you multiply this by tens of millions MAUs it is significant amount. Also when we use multiple queries it lands on multiple server instances so server can't do the batching to following micro-services & databases.

We could solve that by manually hoisting the movie ids top to the page and let it fetch the indications. But that would go against components composition principle as the page doesn't know if the movie posters has indications or not. In past we did it the way that we batched those "CanPlay" queries into one array and let the server do the optimization.

We can and we will solve it no matter how this RFC ends but I just wanted to open discussion on this topic and maybe get some ideas that are better then mine.

@kitten
Copy link
Member

kitten commented May 11, 2020

That’s interesting! I agree that this amount of requests would most definitely be unacceptable.
Can you describe how you’re getting into this situation of essentially N+1 separate queries on the client-side? Sorry, I’m not doubting it 😅 I’m just curious as to how a list of movies (in this case) would turn go on to have separate queries for their status rather than having an embedded field for this status.

Either way, I’m sure we can solve this, but ultimately we may run out of interactions between all modes of fetching. Currently we have five (?) different modes of requests.

So I can see how in some cases there are definitely going to be arguments for some kind of batching, but there’s two many variables right now, since we have so many different modes of fetching, so it’s looking a little fuzzy right now 🙃

@jakubriedl
Copy link
Contributor Author

The reason why this N+1 appears on client is the fact that the initial query is public and cacheable but the N queries are not. So if we want to have fast initial load we have to load them separately.

@jakubriedl
Copy link
Contributor Author

jakubriedl commented May 11, 2020

I've played with that and managed to get batchFetchExchange working. It works the way that it replaces fetch with one wrapped in dataloader. It has to be opted-in in useQuery and is enabled only for queries. One caveat is that it works only if persistedQueries are enabled, without them it doesn't make sense for us because the request body would explode on size and we use PQs. Here is the PoC.

In theory it should be possible to implement it without dataloader just using wonka functions but I'm not very familiar with them and didn't manage to get it working.

I noticed one bug though and I've created separate issue for it. If you pass context: {} to useQuery it starts infinite loading/cancel/loading... loop for the query. Quick fix is to use context: React.useMemo({}, []), but I don't think that should be required.

@kitten
Copy link
Member

kitten commented May 15, 2020

Hiya 👋 Sorry to be coming back to you on that this late.
From what we now know, batching is actually quite problematic in some cases, in terms of performance and server-side compatibility. We thought we could batch queries by actually combining their documents, but in a lot of cases this may actually hamper the client-side experience. The batch-spec that Apollo has put forward is interesting to some, but not widely supported outside of Apollo Server and is often not a catch-all solution to performance issues.

I know that in your case it makes sense, and the only alternative would be for you to use cache-and-network and to return null on SSR when the server isn't supposed to add user information. To me that sounds like a strategy that'd still work for a lot of people.

Instead, we may want to write a recipe and merge this issue basically into #815. How does that sound?
I think we document your approach of injecting a special batched fetch function that implements the Apollo Server batching spec, then that'd be enough for those that are really keen on that spec.

@jakubriedl
Copy link
Contributor Author

That is fair and I fully understand your reasons. The recipe will be great for anyone who will need to solve that.

@jesselee34
Copy link

@kitten Can you elaborate on what this means?

"It's common practice to hoist queries per page"

Does this mean something like having a higher order component (at the page level) which uses the useQuery hook and then passes down the props? Or is there some other pattern for combining queries together?

I couldn't find much about "hoisting queries". Any help would be appreciated!

@kitten
Copy link
Member

kitten commented May 4, 2021

@jesselee34 there's not really any magic to it, but a page doesn't really need separate queries per section. Instead, it often makes most sense to define fragments per structural components. Each component can define its own fragment and those can be composed together upwards until your page only has one or few queries overall to run.

Relay does champion this approach a lot and makes it its standard pattern. We're currently working on some tooling as well to encourage this on top of GraphQL Code Generator.

Overall, given some level of caching I wouldn't worry about batching. Sending multiple API requests is often even better than batching them into one, especially since it helps CDN-level caching. And if you run into problems where multiple queries depend on each other then you may have a schema that doesn't cater for your app perfectly and those queries cannot be batched either way.

Hope that clarifies it ✌️

@reconbot
Copy link

reconbot commented Jun 8, 2021

In the cases where you do desire batching, like my case (where we do not yet support http2, don't have enough control to hoist all the queries, don't desire graphql http caching and have a backend and database that can parallelize queries more efficiently within a request than over multiple requests.) I found I needed something a bit different than @jakubriedl's POC and submit my own. This version works on non persisted queries.

@JoviDeCroock
Copy link
Collaborator

We don't want to explicitly support Query-batching (however a third party can implement this). We will implicitly support this in the UI with Fragment boundaries #1408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future 🔮 An enhancement or feature proposal that will be addressed after the next release
Projects
None yet
Development

No branches or pull requests

5 participants