cosmo Router executed multiple requests to same subgraph to resolve entities

Component(s)

router

Component version

0.130.0

wgc version

Just the router

controlplane version

Just the router

router version

0.130.0

What happened?

Description

While experimenting with the simple router example provided here, I noticed a missed opportunity for batching when resolving entities from a subgraph. Which is a blocker for us to adopt cosmo :/

Steps to Reproduce

After starting the router I executed the following request:

{
   employees {
    id
    products 
  }
  employee(id: 1) {
    id
    products
  }
}

This query touches two subgraphs and created the following Request Trace

Expected Result

I was surprised by the two calls to the "product" subgraph and would have expected them to be combined into a single call. These objects are even from the same source; however, I believe this approach should also work for different types by simply using the following syntax.

query ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Employee {
      __typename
      products
    }
   ... on OtherObjectFromProductsSubgraph {
      __typename
      someField
    }
  }
}

Actual Result

Group all your calls together and send a single request to each subgraph.

Oct 23 '24 13:10 alexus37

WunderGraph commits fully to Open Source and we want to make sure that we can help you as fast as possible. The roadmap is driven by our customers and we have to prioritize issues that are important to them. You can influence the priority by becoming a customer. Please contact us here.

Oct 23 '24 13:10 github-actions[bot]

Hi @alexus37

Not sure, how it is blocking you, as other router implementations are doing the same

It is not a bug, currently, it is designed like that

It is not a trivial thing to implement - you have nested calls for elements of a list of items (which will be batched) and then you have a nested call on another root query field which is an object

The tricky part here is to be able to multiplex representations from different places and then demultiplex them back, you should take into consideration that for each separate item in a list, each representation could already be null, and you still need to filter them In addition, you will need to merge queries, which could have the same types with different field selections and aliases

We have some thoughts on how to implement it, but it is not a priority right now

And we are always open for PRs :)

Thanks, Wundergraph Team

Oct 23 '24 16:10 devsergiy

Hi @devsergiy, thanks for the quick reply.

Not sure, how it is blocking you, as other router implementations are doing the same

I’d like to provide a bit more context to clarify why this is a blocker. We are currently exploring various options to enhance the capacity of our GraphQL API. Our current plan is to rewrite certain parts in Go and utilize federation to manage the components that are still on the old system as well as the transition period. This approach allows us to gradually improve capacity rather than implementing a large change all at once. However, if the call (to provided all the data that is not yet moved to the new go service) to the legacy API isn’t batched into a single request, we will end up using more capacity than before. This would make this approach unfeasible.

We have some thoughts on how to implement it, but it is not a priority right now And we are always open for PRs :)

I would be willing to help get this done but I have no context of the code base. Would you mind sharing some context as well as links to important files or any thoughts you guys had?

Oct 23 '24 16:10 alexus37

@alexus37 could let us know which alternatives you think are using the approach you described?

Oct 23 '24 19:10 Aenimus

@alexus37 could let us know which alternatives you think are using the approach you described?

I haven't tested how the Apollo Federation router behaves in these situations yet. If it generates the same number of requests, we may need to come up with a new plan for the migration. :(

Oct 23 '24 21:10 alexus37

@alexus37 We didn't know the details of your current/feature schema

You have tested an example of multiple root fields, which may result in separate requests, but they anyway will be parallel We optimize fetches using their dependencies and fetch data layer by layer

Usually batching is not required in such cases

Batching is happening for the list items

Imagine something like this

# subgraph 1
type User @key(fields: "id") {
  id: ID!
}

type Query {
  users: [User!]!
}

# subgraph 2
type User @key(fields: "id") {
  id: ID!
  age: Int!
  email: String!
}

And the query

query {
  users {
    id
    age
    email
  }
}

To get all data about users we will need to call subgraph 2, but these requests will be batched into a single HTTP call

There are also such things as InterfaceObjects - when you need to add a field to many types at the same type

So I would not care so much about some of the requests being parallel

Oct 24 '24 10:10 devsergiy

@alexus37 We didn't know the details of your current/feature schema

@devsergiy Let me share an example schema show casing the issue. Imagine you have the following schema.

# subgraph 1 - faster one
extend type Passagner @key(fields: "id") @shareable  {
  id: ID! @external
}

extend type Flight @key(fields: "id") @shareable  {
  id: ID! @external
  passangers: [Passanger!]!
}

extend type Airport @key(fields: "id") @shareable {
  id: ID!
  flights: [Flight!]!
  allPassagner: [Passagner!]!
}

type Query {
  airportV2(id: ID!): Airport
}

# subgraph 2 - slower one
type Passagner @key(fields: "id") @shareable {
  id: ID!
  name: String!
}

type Flight @key(fields: "id") @shareable {
  id: ID!
  name: String!
  passangers: [Passanger!]!
}

type Airport @key(fields: "id") @shareable {
  id: ID!
  name: String!
  flights: [Flight!]!
  allPassagner: [Passagner!]!
}

type Query {
  airport(id: ID!): Airport
}

and the following query:

query {
  airportV2(id: "1") {
    id
    name # not supported in subgraph 1
    flights {
      id
      name  # not supported in subgraph 1
      passangers {
        id
        name  # not supported in subgraph 1
      }
    }
    allPassagner {
      id
      name  # not supported in subgraph 1
    }
  }
}

This will created the following request execution graph

graph TD
    A[Start] --> B[Subgraph 1]
    B -- Query 1 (Entity) --> C[Subgraph 2]
    B -- Query 2 (Batched Entity) --> D[Subgraph 2]
    B -- Query 3 (Batched Entity) --> E[Subgraph 2]
    B -- Query 4 (Batched Entity) --> F[Subgraph 2]

While all these queries (1-4) are executed in parallel they require roughly ~4 (this is not fully correct since if they would be batched more work would be done) times more cpu as compared to being batched together.

Executed queries

query 1 ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Airport {
      __typename
      name
    }
  }
}

query 2 ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Flight {
      __typename
      name
    }
  }
}

query 3 ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Passagner {
      __typename
      name
    }
  }
}

query 4 ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Passagner {
      __typename
      name
    }
  }
}

We would like to have a single query made to subgraph 2 to fetch all these information.

graph TD
    A[Start] --> B[Subgraph 1]
    B -- Query 5 (Batched Entity) --> C[Subgraph 2]

the executed query could look like this:

query 5 ($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on Airport {
      __typename
      name
    }
    ... on Flight {
      __typename
      name
    }
    ... on Passagner {
      __typename
      name
    }
  }
}

We believe that making one heavier request instead of four might be a bit slower but will save a significant amount of CPU cycles.

As for the feature, it might be possible to have this as a configuration option on the router, allowing it to run as an optional optimization after the query plan has been computed.

Oct 24 '24 12:10 alexus37

Hey @alexus37, this is a very interesting topic we've been thinking about for quite some time. Are you available for a conversation to see how we can help you? You can use this link to book: https://wundergraph.com/meet/jensneuse

Oct 25 '24 08:10 jensneuse