Use DataLoader in GraphQL resolvers #1370

ivasilov · 2022-01-24T19:42:20Z

ivasilov
Jan 24, 2022

Hi,

Couple of months ago, we've run into an issue with resolving fields from multiple product variants where for each variant and field a separate SQL query was called. This made calls for big pages of product variants slow so we started using data loaders.

The data loader pattern enables the backend to resolve multiple fields for multiple items using a single SQL query. It batches all calls to a particular function. The data is cached over a single request so you could reuse it from other parts of the code.

For a real life example, I'll share one of our smaller loaders:

export type FacetValueAdditionalFields = {
  id: ID;
  productCount: number;
  categoryCount: number;
};

@Injectable({ scope: Scope.REQUEST })
export class FacetValueAdditionalFieldsLoader {
  private data: DataLoader<ID, FacetValueAdditionalFields>;

  constructor(private connection: TransactionalConnection) {}

  private createLoader = (_ctx: RequestContext) => {
    return new DataLoader(async (ids: readonly ID[]) => {
      const results: FacetValueAdditionalFields[] = await this.connection.rawConnection
        .createQueryBuilder()
        .select('f.id', 'id')
        .addSelect('COUNT(DISTINCT p.productId)', 'productCount')
        .addSelect('COUNT(DISTINCT c.categoryId)', 'categoryCount')
        .from('facet_value', 'f')
        .leftJoin('product_facet_values_facet_value', 'p', 'p.facetValueId = f.id')
        .leftJoin('category_facet_value', 'c', 'c.facetValueId = f.id')
        .where('f.id IN (:...ids)', { ids })
        .groupBy('f.id')
        .getRawMany();

      // make sure the returned objects are in the same order as the one that came in
      // also return zeros where no products are found
      return ids.map(id => {
        const result = results.find(r => r.id === id);
        return { id, productCount: result?.productCount || 0, categoryCount: result?.categoryCount || 0 };
      });
    });
  };

  load(ctx: RequestContext, key: ID) {
    if (!this.data) {
      this.data = this.createLoader(ctx);
    }
    return this.data.load(key);
  }
}

The resolver which uses this loader looks like this:

@Resolver('FacetValue')
export class FacetValueEntityResolver {
  constructor(private loader: FacetValueAdditionalFieldsLoader) {}

  @ResolveField()
  async productCount(@Ctx() ctx: RequestContext, @Parent() facetValue: FacetValue) {
    const result = await this.loader.load(ctx, facetValue.id);
    return result.productCount || 0;
  }

  @ResolveField()
  async categoryCount(@Ctx() ctx: RequestContext, @Parent() facetValue: FacetValue) {
    const result = await this.loader.load(ctx, facetValue.id);
    return result.categoryCount || 0;
  }
}

Using this approach the GQL query { product(id) { id, facetValues { categoryCount, productCount } will be resolved in 1-2 SQL queries. We've actually added our own schema which resolves faster than the Vendure one:

type VariantPreview {
    id: ID!
    sku: String!
    price: Int!
  }

  extend type Product {
    """
    An array of variant preview objects. These are very efficiently added within the single data loader query
    """
    variantPreview: [VariantPreview!]!
  }

I can provide a POC PR for any of the Vendure entities if you want. We're really hoping that you're going to start using this pattern as it speeds up the resolving tremendously.

michaelbromley · 2022-01-25T09:40:09Z

michaelbromley
Jan 25, 2022
Maintainer

Hi,

Thanks for bringing this up! I've wanted to eventually investigate the dataloader pattern, so I'm very happy to see that you have already been able to work with it in Vendure.

I've not done anything on it so far since we weren't quite there on the sequence of "make it work, make it right, make it fast". But now I think its definitely worth looking into how we can use this to make the default Vendure perf faster.

I would propose an initial evaluation like this:

Identify common queries which would produce inefficient n+1 SQL operations.
Benchmarks on those
POC application of dataloader to those
Benchmarks after.

Also noteworthy is the potential cost of using request-scoped providers:

Using request-scoped providers will have an impact on application performance. While Nest tries to cache as much metadata as possible, it will still have to create an instance of your class on each request. Hence, it will slow down your average response time and overall benchmarking result. Unless a provider must be request-scoped, it is strongly recommended that you use the default singleton scope.

0 replies

ivasilov · 2022-01-25T10:40:44Z

ivasilov
Jan 25, 2022
Author

I would say the following queries are a good start in the Shop API:

the array queries (collections, products, facets). The single result queries can also use the loader.
the complex fields of various types (Product->variants, Orderline->items, Orderline->productVariant)

To avoid request-scoped providers, you can use something like https://www.npmjs.com/package/nestjs-dataloader. I didn't like declaring the loader type on each usage, but they'll be faster that way.

0 replies

michaelbromley · 2022-04-14T14:41:07Z

michaelbromley
Apr 14, 2022
Maintainer

Did a POC dataloader implementation here: 858540c

Note that this was done after already putting in significant perf work as part of #1506, so the dataloader did not seem to significantly improve on the work already done.

Right now I'm going to leave this and possibly revisit in the next round of perf work.

0 replies

ivasilov · 2022-04-15T00:33:43Z

ivasilov
Apr 15, 2022
Author

Hey Michael,

Just a few notes for the next time you revisit this:

You didn't get any improvements because you're still doing N+1 queries. For each product, you're getting a list of variants (N queries here) and then fetching all variants in bulk which is not faster than the previous approach.

All SQL queries should go into the loader:

return new DataLoader<string, ProductVariant[]>(async ids => {
    const qb = connection
      .getRepository(ctx, ProductVariant)
      .createQueryBuilder()
      .select('productvariant')
      .innerJoin('productvariant.channels', 'channel', 'channel.id = :channelId', {
        channelId: ctx.channelId,
      })
      .innerJoin('productvariant.product', 'product', 'product.id IN ...:productIds', {
        productIds: ids,
      });

    if (ctx.apiType === 'shop') {
      qb.andWhere('productvariant.enabled = :enabled', { enabled: true });
    }
    const variants = await qb.getMany();
    const groupedVariants = groupBy(variants, v => v.productId);

    return ids.map(id => {
      const variants = groupedVariants[id];
      return service.applyPricesAndTranslateVariants(ctx, variants);
    });
  });

and the field should look like this:

@ResolveField()
async variants(
    @Ctx() ctx: RequestContext,
    @Parent() product: Product,
    @Loader(ProductVariantLoader.name) productVariantLoader: DataLoader<any, Translated<ProductVariant>>,
): Promise<Array<Translated<ProductVariant>>> {
    return productVariantLoader.load(product.id);
}

I haven't run the code, you may need to tweak the query.

This can probably wait for the next perf work.

0 replies

mschipperheyn · 2023-09-29T14:29:59Z

mschipperheyn
Sep 29, 2023
Collaborator

My experience with writing dataloaders for a newsfeed implementation is that they promote performance improvements if you avoid joining. Especially, if certain referenced data elements are retrieved as lists and are repetitive. E.g. getting all the users and photos related to posts for a newsfeed, getting all the products including photos for a collection. It makes items also more cacheable, because the query result elements become more atomic and less specific to a certain join scenario. You're just getting a root object and then getting related objects in a series of IN [...id] style queries. Of course, the trade off is exactly here, sometimes it's faster to get stuff in one go with a join as opposed to as subsequent queries. So, it depends. And performance has multiple facets, scalability, memory intensity, etc. Atomic queries get executed faster on a per query basis than complicated join style queries. That might become relevant if you're hosting a large market place on Vendure.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use DataLoader in GraphQL resolvers #1370

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Use DataLoader in GraphQL resolvers #1370

ivasilov Jan 24, 2022

Replies: 5 comments

michaelbromley Jan 25, 2022 Maintainer

ivasilov Jan 25, 2022 Author

michaelbromley Apr 14, 2022 Maintainer

ivasilov Apr 15, 2022 Author

mschipperheyn Sep 29, 2023 Collaborator

ivasilov
Jan 24, 2022

michaelbromley
Jan 25, 2022
Maintainer

ivasilov
Jan 25, 2022
Author

michaelbromley
Apr 14, 2022
Maintainer

ivasilov
Apr 15, 2022
Author

mschipperheyn
Sep 29, 2023
Collaborator