Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ApplyToInternal::compute_output_shape and JSONSelection::shape returning shape::Shape #6458

Draft
wants to merge 19 commits into
base: benjamn/JSONSelection-variadic-ExprPath
Choose a base branch
from

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Dec 13, 2024

This draft PR uses our newly published shape crate to implement the ApplyToInternal::compute_output_shape trait method, which returns a shape::Shape, which is an immutable and reference-counted Rust data structure that describes a set of constraints on JSON data. The "shape" terminology is a close synonym for "type" but I'm using it to emphasize the immutable, reference-free, value-typed nature of the JSON domain.

As an important part of this ::compute_output_shape system, all -> methods now define a method_shape function that allows them to reject unexpected input shapes and compute their own output shape in terms of a given input shape and variable shapes. In other words, the type signatures of -> methods are expressed using Rust code which ultimately returns a shape::Shape, which then gets further processed by the compute_output_shape logic, allowing -> methods to blend seamlessly into the shape processing system.

This new ability to compute a static output Shape for a given JSONSelection string allows us to relax the constraint that inline PathSelection items must have a static {...} subselection at the end. The static subselection was previously our only means of statically determining the output fields of the PathSelection, but now we can statically/programmatically examine the output Shape, which doesn't distinguish between static subselections and -> method results (as long as they have the same computed object shape). We are also now requiring a ... token before inline PathSelection items without static subselections, and encouraging the ... token for any inline PathSelection, even if it has a subselection (though enforcing it in that case would be a breaking change).

Finally, this PR proposes a way of setting the __typename in situations where abstract GraphQL types (interfaces and unions) are involved: by writing a { <Type> ... } annotation at the beginning of any subselection, developers can indicate to both the Shape system and the runtime system that the __typename of this output object should have the literal shape "Type", and the object should include __typename: "Type" at runtime. The underlying Shape representation actually does use a __typename field to model this information, but that's a choice made by the JSONSelection code, not something embedded in the Shape library.

I wish I could have broken this PR up more, though I was able to split out #6455, #6456, and #6457.

I don't believe this PR should introduce any logical runtime differences of behavior in the connectors system, since it's basically adding the compute_output_shape method and then only using it in tests. Once I address a few TODOs, the output shapes will play a role in checking the validity of ...-inlined PathSelection items, but the ... syntax is new with this PR, so changing it is still easy.

@benjamn benjamn self-assigned this Dec 13, 2024
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Dec 13, 2024

✅ Docs Preview Ready

No new or changed pages found.

@router-perf
Copy link

router-perf bot commented Dec 13, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

Comment on lines 83 to 90
```ebnf
JSONSelection ::= PathSelection | NamedSelection*
SubSelection ::= "{" NamedSelection* "}"
SubSelection ::= "{" OutputTypeAnnotation? NamedSelection* "}"
OutputTypeAnnotation ::= "<" Identifier ">"
NamedSelection ::= NamedPathSelection | PathWithSubSelection | NamedFieldSelection | NamedGroupSelection
NamedPathSelection ::= Alias PathSelection
NamedPathSelection ::= (Alias | "...") PathSelection
NamedFieldSelection ::= Alias? Key SubSelection?
NamedGroupSelection ::= Alias SubSelection
Copy link
Member Author

@benjamn benjamn Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the only new JSONSelection syntax we need: { <Type> ... } annotations and ... as a way of denoting inline NamedPathSelection items.

} else if let Ok((remainder, _dots)) =
tuple((spaces_or_comments, ranged_span("...")))(input)
{
// TODO Enforce path has a static output shape with known object keys.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a remaining TODO.

Comment on lines +429 to +431
// TODO If the path is inlined with ... and has no explicit
// SubSelection, we should use the computed Shape of the
// path to determine the output fields.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another TODO I'm planning to get to before taking this PR out of draft.

Comment on lines +1065 to +1044
} else if *inline {
// If the PathSelection is inlined with ... but does not
// have a subselection, we can use the computed Shape of
// the path to synthesize the output selections.
// TODO Implement this.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another TODO for the list.

@benjamn benjamn changed the base branch from benjamn/JSONSelection-withError-arrow-method to benjamn/JSONSelection-variadic-ExprPath December 16, 2024 16:56
NamedSelection ::= NamedPathSelection | PathWithSubSelection | NamedFieldSelection | NamedGroupSelection
NamedPathSelection ::= Alias PathSelection
NamedPathSelection ::= (Alias | "...") PathSelection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this allow for future keywords like if or match directly after the ...?

Copy link
Member Author

@benjamn benjamn Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the PathSelection started with an if token, that would be a potential concern for making if a keyword in the future, but if we stick with the the idea of a parenthesized test (i.e. ... if (test) {}), the ( character should prevent parsing as a PathSelection, causing the parser to backtrack and consider other ways of parsing ... if (test) {} (which we can define).

Same goes for ... match (value), as long as we use parentheses, or as long as we're willing to make the match field illegal immediately after ... (a breaking change but not unthinkable).

@@ -69,6 +73,44 @@ impl JSONSelection {

(value, errors.into_iter().collect())
}

pub fn shape(&self) -> Shape {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for having shape code in this file? it feels different enough to warrant a separate file and maybe a separate trait, even if it's structured basically the same

Copy link
Member Author

@benjamn benjamn Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It made the code a little easier to write to have apply_to_path nearby, but there's probably no harm in moving it to a new file now.


for pair in args {
if let LitExpr::Array(pair) = pair.as_ref() {
if pair.len() == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return an error if the length isn't exactly 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an example of error handling? since this doesn't return a Result, what should a caller do with a Shape::error_with_range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we'll add a way to collect/traverse/visit any ShapeCase::Error variants within a given Shape, to handle them programmatically or surface them in the debugger, but for now the main point of errors is to fail validation against expected GraphQL schema types.

Since you can embed ShapeCase::Error within a larger shape, shape errors are similar to the GraphQL errors-as-data pattern, where you're able to get back some partial shape information along with contextual errors.

@benjamn benjamn force-pushed the benjamn/JSONSelection-variadic-ExprPath branch from 909e897 to e5eedde Compare December 16, 2024 18:24
@benjamn benjamn force-pushed the benjamn/JSONSelection-compute_output_shape branch from 15b6b8c to 1fde6d8 Compare December 16, 2024 18:24
@benjamn benjamn force-pushed the benjamn/JSONSelection-variadic-ExprPath branch from e5eedde to 16d0306 Compare December 16, 2024 18:48
@benjamn benjamn force-pushed the benjamn/JSONSelection-compute_output_shape branch from 1fde6d8 to ebb3e95 Compare December 16, 2024 19:03
@benjamn benjamn force-pushed the benjamn/JSONSelection-variadic-ExprPath branch from 16d0306 to dfeb558 Compare December 16, 2024 19:12
This syntax allows us to model the GraphQL behavior of automatically
mapping selection sets over lists of objects, even when we can't
determine (yet) whether the input has an array shape or not (e.g. any
undeclared named shape reference). Once you provide enough input shape
information to resolve the array ambiguity, you won't see as many `.*`
segments in named shape subpaths (and you also won't see as many named
shape references in general, because you've probably provided at least
some of the shape definitions for those names).
TODO Make sure we audit any validations preventing key collisions,
because we now have a more graceful way of handling the collisions.
The special `{ <Type> }` syntax for selection sets still works as a
shorthand for `{ __typename: "Type" }`, and may be more useful for
static analysis because it enforces `Type` is an identifier. We may even
want to forbid setting `__typename` to anything other than a (union of)
string literal shapes, or forbid setting `__typename` directly
altogether, in favor of the `<Type>` syntax, since it provides the
flexibility we're comfortable with.
This allows us to postpone PR #6456 for further consideration.
@benjamn benjamn force-pushed the benjamn/JSONSelection-compute_output_shape branch from ebb3e95 to 70486c1 Compare December 16, 2024 20:01
@apollo-bot2
Copy link

Detected SAST Vulnerabilities

🔴 Potential Security Issues Found

If you are seeing this message, it means that the security scanning tool that Apollo uses to test our source has identified a potential security issue in code introduced or updated via your branch. Information about what was detected and steps to move forward are below.

If the scanner detected a legitimate issue, please take action to correct it prior to merging this PR. The action required will vary based on the detection. If the detection is a false positive, please follow the steps below to resolve the issue.

Issues Detected

Message Caused CI Failure? Link Path Signature
Insecure WebSocket Detected. WebSocket Secure (wss) should be used for all WebSocket connections. Yes Link apollo- router/src/protocols/websocket.rs rules.providers.semgrep.security.ja vascript.lang.security.detect- insecure-websocket
Insecure WebSocket Detected. WebSocket Secure (wss) should be used for all WebSocket connections. Yes Link apollo- router/src/protocols/websocket.rs rules.providers.semgrep.security.ja vascript.lang.security.detect- insecure-websocket

False Positive Resolution Process

The easiest way to resolve a false positive is to add a comment containing nosemgrep to the code that triggered the detection. You can add the comment directly to the line triggering the detection or on the line immediately above it. For example:

#!/usr/bin/env python3

def my_function():
    my_code_generating_detection() # nosemgrep

What if I can't add a line comment?

If you can't add a comment because the detection is generated by a file that does not support comments (like JSON), you can use Semgrep's .semgrepignore file. For more information about how to create an exclusion via .semgrepignore, refer to the Semgrep documentation.


Getting Help

The Apollo Security team is available to assist in resolving this issue. Please tag us on this PR using @apollographql/security if you need assistance!


How do I know I fixed this correctly?

Once you have resolved the issue completely, this message will disappear! If you're still seeing this message, there is more to do prior to merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants