Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update include_subgraph_errors with extensions customization #6402

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

Samjin
Copy link

@Samjin Samjin commented Dec 5, 2024

WIP - test cases are not updated yet. Need apollo team's opinion for the proposed config below.

This is not a breaking change, the default behavior of include_subgraph_errors remains the same.

include_subgraph_errors: 
  all: true # Propagate all errors
  redact_extensions_keys: # This is ignored with `all: true`
    - "service"
    - "exception"

include_subgraph_errors: 
  all: false # Will redact error message and extensions keys from `redact_extensions_keys`.
  redact_extensions_keys:
    - "service"
    - "exception"

include_subgraph_errors: 
  all: true # Will NOT redact error message and extensions even tho the keys are specified below.
  redact_extensions_keys:
    - "service"
    - "exception"
  subgraphs:
    product: false # will redact error message and extensions keys based on default config. 
    reviews:
      - "service" # Never redact keys listed here, but error message and other extension keys from default config will be redacted.

include_subgraph_errors:
  all: false # Will redact error message and extensions.
  redact_extensions_keys:
    - "service"
    - "exception"
  subgraphs:
    product: true # Will NOT redact anything.
    reviews:
      - "service" # Never redact keys listed here, but error message and other extension keys from default config will be redacted.

Reasons we need this:

  1. Some subgraphs need to pass non-vulnerable data to client via error extensions.
  2. Some clients depend on certain data in error extensions. Although they shouldn't do it, migrating away from it would take time, especially from natives. While waiting for their migration, this PR would allow the original error message and selected keys to be redacted or propagated per subgraph.

There is no issue number yet.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@Samjin Samjin requested review from a team as code owners December 5, 2024 00:44
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Dec 5, 2024

✅ Docs Preview Ready

No new or changed pages found.

@router-perf
Copy link

router-perf bot commented Dec 5, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@Samjin Samjin marked this pull request as draft December 5, 2024 00:44
@Samjin Samjin mentioned this pull request Dec 5, 2024
6 tasks
Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For configuration I think we could find a more explicit configuration and I think @BrynCooke will have better ideas than me about configuration

redact_extensions_keys: Vec<String>,

/// Override default configuration for specific subgraphs
subgraphs: HashMap<String, SubgraphConfig>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use this struct https://github.com/apollographql/router/blob/dev/apollo-router/src/configuration/subgraph.rs#L78:L78 to have the same pattern everywhere in our configuration. Do you think your configuration could match this ?


for error in response.response.body_mut().errors.iter_mut() {
let mut new_extensions = error.extensions.clone();
match subgraph_config.clone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to clone it.

Suggested change
match subgraph_config.clone() {
match &subgraph_config {

Copy link
Member

@IvanGoncharov IvanGoncharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be the opposite, e.g. include_extensions_keys, which exposes all extension keys by default.
However, if you specify some extension names, only those extensions are allowed.
Otherwise, we are in a situation where each new extension adds to the errors caused by the router, forcing everyone to update their config to prevent information from leaking.

The same goes for subgraphs.
If a subgraph adds a new extension (this can happen with a GraphQL library/framework update), it immediately leaks until someone updates the config.

Update: This discussion is a good example of why we need to be able to add new extensions in the router #6359 (comment)

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Dec 11, 2024

I think it should be the opposite of include_extensions_keys, which exposes all extension keys by default. However, if you specify some extension names, only those extensions are allowed. Otherwise, we are in a situation where each new extension adds to the errors caused by the router, forcing everyone to update their config to prevent information from leaking.

The same goes for subgraphs. If a subgraph adds a new extension (this can happen with a GraphQL library/framework update), it immediately leaks until someone updates the config.

Update: This discussion is a good example of why we need to be able to add new extensions in the router #6359 (comment)

I think the ideal solution would be to have include_extensions_keys support both the array of names and boolean values, where true (the default is 1.x) allows all extensions and false (the default in 2.x) removes all extensions.
This would be the safest option, in my opinion. It would allow us to improve the router's debuggability (by adding extensions) without worrying that we are leaking private information.

@bnjjj What do you think?

@bnjjj
Copy link
Contributor

bnjjj commented Dec 11, 2024

I agree @IvanGoncharov

@abernix
Copy link
Member

abernix commented Dec 12, 2024

@Samjin I like where this is going but I do agree with the suggestions above from @IvanGoncharov and I think they would cater to important use cases from other users. Do you think you'd be able to introduce those changes being suggesting here?

In terms of release scheduling, if we can get this all fixed and to a state that we're all mutually happy with, and get that done by middle of January, then this should be able to make it into our v1.60.0 release which ships at the end of January. To call that done, we'd be looking at:

  • Getting the changes in place.
  • Getting tests written (the team can help advise here if needed.)
  • Changelog (we can help polish this, but a starting point is lovely) (see How to create a Changelog entry

Could that work?

@abernix abernix requested a review from BrynCooke December 12, 2024 13:38
@abernix
Copy link
Member

abernix commented Dec 12, 2024

@BrynCooke Tagging you on this too. Could you take a look?

@BrynCooke
Copy link
Contributor

BrynCooke commented Dec 12, 2024

I spent a bit of time thinking about this and think the config should look like this:


include_subgraph_errors:
  all: true # Propagate all errors and all extension keys (current)
  
include_subgraph_errors:
  all:
    allow_extensions_keys: # Propagate all errors and only the specified extension keys
      - "service"
      - "exception" 
    
include_subgraph_errors:
  subgraphs:
    products: true # Propagate all errors and all extension keys from products subgraph (current)
      
include_subgraph_errors:
  subgraphs:
    products:
      allow_extensions_keys: # Propagate all errors and only the specified extension keys
        - "service"
        - "exception"
        

We would use an enum to allow true or a nested allow_extension_keys.

#[serde(untagged)]
enum ErrorMode {
  Global(bool)  // true|false to maintain backward compatibility
  Allow {
    allow_extension_keys: Vec<String> // Allow keys but deny all others
  }
}

The advantage of this config over what is there in the PR is that it doesn't have rules that the user has to know. e.g. if something it set it ignores this other setting.

@Samjin Would this be config suit your needs? If not can you let us know why a deny list would be better for you?
The above config structure does not preclude us from adding deny support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants