Optimized batch loading without blocking #518

atombender · 2019-01-29T21:40:38Z

I'm trying to understand how gqlgen can optimize batch loading lazily/concurrently, given that all resolvers are synchronous — i.e. they return the value itself, instead of a thunk (future).

I was reading through the code example for data loading to try to understand how dataloaden works. In the its code, I noticed that Load(key) simply calls LoadThunk(key)(). Then I noticed this line in the readme: "This method will block for a short amount of time, waiting for any other similar requests to come in". Is this how gqlgen assumes it will work?

If you look at the graphql-go project, all resolvers can return either values or thunks. This means the entire GraphQL tree can be resolved to thunks, which can then queue up all the keys they need so that when the thunks are called, all the objects can be fetched at once.

Can someone fill me in on how gqlgen is supposed to handle this? A resolver like this:

func (r *rootQueryResolver) Things(ctx context.Context, uids []*string, limit *int) ([]*models.Thing, error) {
	// ...
}

...cannot defer any work at all; it has to return. The only way to batch this is to execute all the resolvers concurrently in goroutines, then have some kind of queue that keeps track of when they've all resolved their keys — or just wait for an arbitrary interval. That seems wrong to me, and all the attendant synchronization sounds like it would be super inefficient.

The text was updated successfully, but these errors were encountered:

vektah · 2019-01-30T00:51:20Z

The only way to batch this is to execute all the resolvers concurrently in goroutines [... and] just wait for an arbitrary interval.

This is what gqlgen does, we eliminate a bunch of places where creating goroutines is unnessicary (binding to fields directly, or methods that dont take a context) and goroutines are cheap enough - benchmarks. There are still a few optimization targets here.

There are optimizations we can make on both sides:

a. pooling goroutines to avoid morestack (see Performance improvments #465)
b. notifying dataloaden when all goroutines have been dispatched and the scheduler has had a chance to run. This is enough to avoid the sleep.

Overarching goal for gqlgen is for clean, readable, safe code. I dont think trading clean stacktraces and return values for extra milliseconds (way less if b is actioned)

atombender · 2019-01-30T01:15:52Z

Thanks. Since you're the one who's done benchmarks, I'm not going to argue until I could possibly offer contrary evidence, but I followed that benchmark link, and it's not actually doing any concurrent batch loading. It's just measuring a single resolver that returns a static value.

I did some simulated benchmarking here. If a GraphQL returns 1000 objects, each with 3 joined objects, the gqlgen architecture will spawn 1000 * 3 goroutines, as I understand it. With a test that iterates over 3000 slice elements in goroutines (using waitgroups and mutexes), I'm getting roughly a 100x performance slowdown with goroutine. It's presumably not the goroutines, but the locking. Since a data loader will need locking for its cache, it might come down to the same numbers, of course. I'm not clear how much locking is needed inside the gqlgen resolver code itself.

But I'm also skeptical that you can achieve sane batch loading through arbitrary deadlines. That sounds like asking for unpredictable performance; as I understand the architecture, one little GC blip and your query could explode into N individual queries because the keys haven't been registered yet.

I'm not sure that thunks affect "cleanness", safety or stack traces? If anything, thunks seem simpler to me than spawning and waiting for goroutines.

vektah · 2019-01-30T02:00:01Z

I'm not sure that thunks affect "cleanness", safety or stack traces? If anything, thunks seem simpler to me than spawning and waiting for goroutines.

By safe I mostly mean having a single correct resolver signature that can be put into the interface. Go's type system isnt powerful enough to allow multiple different return types and maintain type safety.

When a thunk panics its stack is from the thing dispatching thunks, when a goroutine panics the stack is what you expect, each resolver that lead to this error.

I'm reluctant dive into any work here without looking more broadly at what should be included in gqlgen, id caching and query panning are both interesting, how much of the dataloader should be moved in.

That said if you are hitting performance issues and have some benchmarks to share, and have the time to come up with some clean solutions work in this area is appreciated. Do keep in mind there is a very large refactor coming in the next branch. Dont start anything on master.

atombender · 2019-01-30T03:53:04Z

By safe I mostly mean having a single correct resolver signature that can be put into the interface.

The simplest solution (which at this point would of course break existing users) would be to always return a function, which is not particularly annoying for non-thunk loaders:

func (r *orderResolver) Items(ctx context.Context, obj *Order) (func() ([]Item, error), error) {
	return func() ([]Item, error) {
		// ...
	}, nil
}

When a thunk panics its stack is from the thing dispatching thunks, when a goroutine panics the stack is what you expect, each resolver that lead to this error.

Yet both will have a stack trace pointing within gqlgen's resolving hierarchy. I honestly don't see the difference, but maybe I'm not seeing something.

That said if you are hitting performance issues and have some benchmarks to share

Well, I'm merely evaluating gqlgen at this point. I've started writing app that uses graphql-go/graphql, but I like the "schema first" approach and generating code, since it means both the GraphQL code and the Go types can be generated from the same time. I'm not overly concerned with performance for this app, but I don't like the added complexity. Using goroutines instead of thunks adds a lot of synchronization at the data loading level. I've already written my batch loading using the graph-gophers/dataloader library, which I'm quite happy with. It's simple and easy to understand.

Having looked at a few of these GraphQL libraries, I'm of the opinion that a server-side GraphQL implementation like gqlgen should absolutely have batch loading built in, because it is in the nature of GraphQL to express many fetches in a single query. An app that doesn't need batch loading can choose not to batch-load, but those will in the minority.

vektah · 2019-02-07T08:06:04Z

related #520

leonth · 2019-04-13T03:19:56Z

Hi, I'm evaluating gqlgen as well for our internal use.

notifying dataloaden when all goroutines have been dispatched and the scheduler has had a chance to run. This is enough to avoid the sleep.

I agree and this is what other implementations that do not use promises e.g. PHP are doing.

By safe I mostly mean having a single correct resolver signature that can be put into the interface. Go's type system isnt powerful enough to allow multiple different return types and maintain type safety.

Since gqlgen itself already requires code generation step, if the dataloading mechanism (dataloaden) can also be generated at the same time (possibly in the same configuration file etc) then this should be less of an issue.

Having looked at a few of these GraphQL libraries, I'm of the opinion that a server-side GraphQL implementation like gqlgen should absolutely have batch loading built in, because it is in the nature of GraphQL to express many fetches in a single query.

+1. This is also what the PHP implementation above does, as well as other solid implementations like Sangria do.

bugzpodder · 2019-07-29T20:15:44Z

@atombender @leonth @steebchen @a-h
I have prototyped something that worked for me, with less reliance on blocking. #790 If you have any feedback please let me know.

atombender · 2019-07-29T20:22:25Z

Thanks, @bugzpodder. Unfortunately, I decided to not use gqlgen because of this issue; I found graphql-go's resolver design to be cleaner and I'm using that. So I'm not in a position to try out your PR anymore.

stale · 2019-09-27T21:17:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

frederikhors · 2019-09-27T22:24:09Z

no stale

stale · 2019-11-26T23:18:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

atombender · 2019-11-26T23:18:50Z

Not stale.

stale · 2020-01-25T23:34:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

frederikhors · 2020-01-26T00:49:46Z

Please can we disable Stale?

macnibblet · 2020-04-02T07:53:50Z

@vektah Any ideas on how to handle this? It's becoming a growing concern for us.

happenslol · 2020-04-19T20:33:11Z

I'd be interested in tackling this issue if there is a way to implement this that we can agree on. It seems the main problem here is that it's not possible to delay fetching of fields when the resolver is called, and I can see 2 possible solutions for this:

Add a directive (like @batch) which will generate code that lets you return ids or hand you a function with a batcher as an argument, or
Add configuration options in gqlgen.yml which will change the return type of the function to something of your choosing, and then require you to implement a resolver which gets passed the collected returns of that "tick".

I would personally prefer 2., since it keeps implementation specifics out of the schema. Please tell me if you would consider one of these solutions, or if you have any thoughts.

macnibblet · 2020-04-21T04:43:19Z

I think the second solution sounds very clean, the only issue I remember discussing with someone is the stack traces if something goes wrong gets exceedingly messy but as far as I can think at the moment that's the only downside.

happenslol · 2020-04-21T13:03:36Z

the stack traces if something goes wrong gets exceedingly messy

How so? With the proposed solution, fetching a field that is marked as batched would simply be divided into two phases - fetching the IDs, and fetching the data using the IDs. If one of those steps goes wrong, you would get the exact same stack traces as if you had a single resolver.
Just to clarify what this would look like from the library users' perspective:

Mark a field as batched in gqlgen.yml

models:
  Foo:
    model: ...
    batch:
        # Defining both model and type allows
        # us to define custom ID types.
        - field: field1
          model: string
        - field: field2
          model: github.com/99designs/gqlgen/graphql.Int64

2 Resolvers are generated for you instead of one:

interface FooResolver {
  // The ID resolver lets us return only the ID type we defined, and defer
  // loading of the actual data.
  Field1ID(ctx context.Context, obj *model.Foo) (string, error)
  Field2ID(ctx context.Context, obj *model.Foo) (graphql.Int64, error)

  // The second part is the data resolver, which hands us IDs and expects
  // us to resolve them to actual entities.
  Field1Data(ctx context.Context, ids []string) (map[string]*model.Field1, error)
  Field2Data(ctx context.Context, ids []graphql.Int64), map[graphql.Int64]*model.Field2, error)
}

Now, for each entity where a batch resolver is present, all requests for 1 "level" will automatically be collected and only resolved once.

Am I missing something?

macnibblet · 2020-04-29T05:52:48Z

No that does make sense

Emyrk · 2020-12-23T13:47:11Z

It's been some time since this has been discussed. I would very much like this feature added.

danhlee329 · 2021-06-05T00:24:52Z

This ☝️

happenslol · 2021-10-02T20:17:03Z

Is anyone looking at working on this? I'm getting back into contributing and I'd be interested in helping with this if there's a good chance it will be included in the project.

vikstrous2 · 2021-10-23T20:08:46Z

Related to @atombender's proposal for returning a function, have you guys considered the possibility of changing the interface from a blocking/sequential one to a non-blocking one with promises? I was just looking at https://github.com/ReactiveX/RxGo and it might be a good fit for this use case. I haven't fully thought through how this would work, but just throwing it out there :)

pencil · 2021-11-17T19:13:04Z

I was a little surprised to learn how batching is currently implemented with gqlgen – especially since the feature comparison page makes it seem like its implementation is on par with that of graphql-go. Best case, the gqlgen approach introduces extra latency as well as server load and worst case it doesn't prevent the duplicate queries in case the batching window is missed.

I do agree that the interface is complicated by supporting a more sophisticated batch loading approach but that is somewhat remediated by keeping the feature optional. Are the gqlgen maintainers in agreement with that? What would a solution have to look like to be accepted?

frederikhors · 2021-11-17T19:42:38Z

I was a little surprised to learn how batching is currently implemented with gqlgen – especially since the feature comparison page makes it seem like its implementation is on par with that of graphql-go. Best case, the gqlgen approach introduces extra latency as well as server load and worst case it doesn't prevent the duplicate queries in case the batching window is missed.

I do agree that the interface is complicated by supporting a more sophisticated batch loading approach but that is somewhat remediated by keeping the feature optional. Are the gqlgen maintainers in agreement with that? What would a solution have to look like to be accepted?

Can you help here with a PR?

pencil · 2021-11-17T20:34:12Z

Can you help here with a PR?

Absolutely, but it isn't clear to me from the discussion above if there is an agreed approach that would be accepted in the form of a PR. Maybe the first step would be a clear RFC that then could be signed off by the maintainers? Or did I miss something and there is an accepted proposal?

n-e · 2022-09-08T16:31:38Z

Hi, I'm having similar problems :

batch loading with dataloaders that rely on blocking for a predetermined amount of time results in unpredictable performance, and gets very slow under high load, since the batches get really small
running queries with gqlgen + dataloaden appears slow (see below)

Regarding problem n°2, I'm running this query :

query q {
  root {
    children(num: 100) {
      children(num: 10) {
        children(num: 10) {
          id
        }
      }
    }
  }
}

on a toy project where the Children resolver calls a dataloaden dataloader, whose Fetch function returns an array of children of length num.

On my computer, the query above takes 10ms with a 100us dataloader delay, and 30ms with a 5ms dataloader delay, which I have found to be the minimum to have adequate batching on the real-life queries I'm making. This feels at least an order of magnitude too slow.

Having a look into the code, it seems that the above query causes 10 000 goroutines to be dispatched, which then need to be synchronized to access the dataloader cache. On real-life queries, even more goroutines would be dispatched (for example, if we replaced the id field with 3 fields with "explicit" resolvers, 40 000 goroutines would be dispatched).

Are you sharing the same conclusions?
Have you found a way to have predictable batch dataloading under load?

I'm looking to fix this by implementing built-in batch loading. In the example above, instead of having this resolver:

func (r *itemResolver) Children(ctx context.Context, obj *model.Item, ...) ([]*model.Item, error)

we would have:

func (r *itemResolver) Children(ctx context.Context, obj []*model.Item, ...) ([][]*model.Item, error)

The execution plan would then consist of the following 5 function calls :

Root(ctx) - returns 1 child
Children(ctx, [root]) - returns an array with one element : an array 100 children
Children(ctx, flattened result of the above resolver) - returns an array with 100 elements : each an array of 10 children
Children(ctx, flattened result of the above resolver) - returns an array with 1000 elements : each an array of 10 children
Id(flattened result of the above resolver) - returns an array of 10 000 ids

Instead of thousand of goroutines there would be five, and we won't need to do any synchronization when loading data. Also, this could be backwards-compatible by letting the user choose between the new resolver format or the old one, which could be called in a loop.

Is this something that would fit with gqlgen? What do you think of it? Any problems you see?

I've started implementing this, and though it seems simple in theory, it gets quite complicated in practice, for example:

this approach doesn't place nice at all with the field context (which, as far as I understand, has a one to one relationship with a field in the results, not the query)
the results tree isn't walked in the same order as in the current implementation, what are the consequences?
there is plenty of rework to do to handle every feature such as middlewares, errors, complexity, etc.

DylanRJohnston-FZ · 2023-07-28T01:01:10Z

Strong +1 with this, we've experience a lot of production problems around spawning on too many goroutines for trivial field resolvers and as a result have had to inline them everywhere they appear in the Graph instead. The spawning of separate goroutines only to have them all block on a single data loader also feels very silly. Yes gouroutines are cheap, but they're not free, and bookkeeping them becomes a significant fraction of the runtime when there are thousands of them.

vektah mentioned this issue Jan 30, 2019

Architecture using timeouts to call fetch? vektah/dataloaden#14

Open

vektah added the performance label Feb 7, 2019

vektah mentioned this issue Feb 10, 2019

Add an intermediary loader interface #543

Closed

edsrzf mentioned this issue Aug 19, 2019

Pool goroutines #841

Closed

2 tasks

stale bot added the stale label Sep 27, 2019

stale bot removed the stale label Sep 27, 2019

stale bot added the stale label Nov 26, 2019

stale bot removed the stale label Nov 26, 2019

stale bot added the stale label Jan 25, 2020

stale bot removed the stale label Jan 26, 2020

vektah added the accepted label Feb 18, 2020

vektah mentioned this issue Feb 18, 2020

[proposal] Implement batch resolver to handle N+1 problem #1033

Closed

vektah mentioned this issue Mar 5, 2020

[feature] Prepare each level before fetching #1052

Closed

lwc added the execution label Sep 17, 2021

frederikhors mentioned this issue Nov 19, 2021

Can you please activate the Github Discussions tab in this project? #1722

Closed

n-e mentioned this issue Sep 14, 2022

Proposal: deterministic batch loading #2373

Open

hypirion mentioned this issue Dec 26, 2023

Optimized batch loading without blocking graph-gophers/graphql-go#628

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized batch loading without blocking #518

Optimized batch loading without blocking #518

atombender commented Jan 29, 2019 •

edited

Loading

vektah commented Jan 30, 2019 •

edited

Loading

atombender commented Jan 30, 2019

vektah commented Jan 30, 2019 •

edited

Loading

atombender commented Jan 30, 2019 •

edited

Loading

vektah commented Feb 7, 2019

leonth commented Apr 13, 2019

bugzpodder commented Jul 29, 2019

atombender commented Jul 29, 2019

stale bot commented Sep 27, 2019

frederikhors commented Sep 27, 2019

stale bot commented Nov 26, 2019

atombender commented Nov 26, 2019

stale bot commented Jan 25, 2020

frederikhors commented Jan 26, 2020

macnibblet commented Apr 2, 2020

happenslol commented Apr 19, 2020

macnibblet commented Apr 21, 2020

happenslol commented Apr 21, 2020

macnibblet commented Apr 29, 2020

Emyrk commented Dec 23, 2020 •

edited

Loading

danhlee329 commented Jun 5, 2021

happenslol commented Oct 2, 2021

vikstrous2 commented Oct 23, 2021

pencil commented Nov 17, 2021

frederikhors commented Nov 17, 2021

pencil commented Nov 17, 2021

n-e commented Sep 8, 2022

DylanRJohnston-FZ commented Jul 28, 2023

Optimized batch loading without blocking #518

Optimized batch loading without blocking #518

Comments

atombender commented Jan 29, 2019 • edited Loading

vektah commented Jan 30, 2019 • edited Loading

atombender commented Jan 30, 2019

vektah commented Jan 30, 2019 • edited Loading

atombender commented Jan 30, 2019 • edited Loading

vektah commented Feb 7, 2019

leonth commented Apr 13, 2019

bugzpodder commented Jul 29, 2019

atombender commented Jul 29, 2019

stale bot commented Sep 27, 2019

frederikhors commented Sep 27, 2019

stale bot commented Nov 26, 2019

atombender commented Nov 26, 2019

stale bot commented Jan 25, 2020

frederikhors commented Jan 26, 2020

macnibblet commented Apr 2, 2020

happenslol commented Apr 19, 2020

macnibblet commented Apr 21, 2020

happenslol commented Apr 21, 2020

macnibblet commented Apr 29, 2020

Emyrk commented Dec 23, 2020 • edited Loading

danhlee329 commented Jun 5, 2021

happenslol commented Oct 2, 2021

vikstrous2 commented Oct 23, 2021

pencil commented Nov 17, 2021

frederikhors commented Nov 17, 2021

pencil commented Nov 17, 2021

n-e commented Sep 8, 2022

DylanRJohnston-FZ commented Jul 28, 2023

atombender commented Jan 29, 2019 •

edited

Loading

vektah commented Jan 30, 2019 •

edited

Loading

vektah commented Jan 30, 2019 •

edited

Loading

atombender commented Jan 30, 2019 •

edited

Loading

Emyrk commented Dec 23, 2020 •

edited

Loading