Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how the "static access shorthand" feature (a.k.a. "enum shorthands") should handle selector chains and more complex expressions #4130

Open
stereotype441 opened this issue Oct 16, 2024 · 11 comments
Labels
enum-shorthands Issues related to the enum shorthands feature.

Comments

@stereotype441
Copy link
Member

The Dart static access shorthand proposal specifies a new grammar construct allowing a selector chain to begin with a . followed by an identifier or new (the . may be preceded by const). For example (from the spec draft):

Endian littleEndian = .little; // short for `Endian.little`

The currently specified semantics is that the context for the entire selector chain is used to resolve the identifier. This enables more complex examples like this (also from the spec draft):

int posNum = .parse(userInput).abs(); // short for `int.parse(userInput).abs()`

This implies some action at a distance, which will in turn require some extra bookkeeping in the implementations. To see why, consider that a selector chain can involve an arbitrary large number of method invocations, function invocations, null checks, index operations, and property accesses. For example, this would be permitted:

A x = .b()()![1].c; // short for `A.b()()![1].c`

The parse tree for this declaration would look something like this (making some guesses and simplifying assumptions about the AST representation):

  VariableDeclaration
 /     |      |      \
A      x      =       PropertyGet
                     /           \
                 Index            .c
                /     \
            NullCheck  [1]
           /         \
       Function       !
       Invocation
      /          \
  Method          ()
  Invocation
            \
             .b()

The existing downwards inference mechanism passes the context A down from the VariableDeclaration to the PropertyGet, but no further. To implement the feature as currently described, we would need some sort of mechanism for plumbing the context down to the MethodInvocation at the bottom of the tree.

The question at issue here is: what would be the cost of such a mechanism (both in terms of performance and implementation complexity)? Would there be enough user benefit to justify it?

Follow up question: what if we decided to expand the mechanism to cover other kinds of syntax that can appear to the right of an expression (binary operators, conditional expressions, or cascades)? For example:

int i = .parse(userInput).abs() / 2 + 1;
Endian opposite = .host == .little ? .big : .little;
List<int> x = .of(iterator1)..addAll(iterator2);
@kallentu kallentu added the enum-shorthands Issues related to the enum shorthands feature. label Oct 16, 2024
@munificent
Copy link
Member

For reference, I took a look at the Swift spec. The enum shorthand thing in Swift is called "implicit member access" (a pretty nice name, I think).

It does allow selectors ("postfix expressions") after the implicit member. The reference says:

Implicit member expressions can be followed by a postfix operator or other postfix syntax listed in Postfix Expressions. This is called a chained implicit member expression. Although it’s common for all of the chained postfix expressions to have the same type, the only requirement is that the whole chained implicit member expression needs to be convertible to the type implied by its context.

You can even have a member chain where the various return types in the chain wander away from the context type, as long the final operation in the chain returns a subtype of the context type.

They also do what Lasse proposed and allow implicit access to members on the underlying type when the context type is nullable:

If the inferred type is an optional, you can also use a member of the non-optional type in an implicit member expression.

Given that, trying to go ahead and support entire selector chains in Dart seems reasonable to me too. Given the way cascades and null-shorting work, I suspect that users will expect it to work. I think Paul's point in the language meeting about users expecting auto-complete to work after a shorthand was pretty compelling.


Follow up question: what if we decided to expand the mechanism to cover other kinds of syntax that can appear to the right of an expression (binary operators, conditional expressions, or cascades)?

I think we should follow the same rules that we use to determine which surrounding expressions are subject to null-aware shorting.

@lrhn
Copy link
Member

lrhn commented Oct 16, 2024

There are multiple possible levels here.

  • .id is the simplest.
  • plus .id(args) that also supports constructors/static functions, but stops there. (Issue: Has to reject if id is a static getter, because then context type of .id is a separate expression that is type-inferred by itself?)
  • plus more selectors, .id <selector>*.
  • plus postfix/prefix operators and/or assignments, .current++, .current = .current.increment(). Anything short of binary operators.
    • Those are supported by null-shortening.
  • plus left-hand of binary operators, .id<selector>* + 4, where the expression would not have a real context type. (So when the entire expression starts with ., no matter how it parses.

I personally think "selector chain" is a good trade-off. The operators makes things much more complicated because they introduce context types and treat left/right operands differently (not that we don't otherwise), and "constructor only" introduces more special-cases than it's worth.

Including assignment and increment/decrement (which are supported by null-shortening, feels a little wrong here. Assignment introduces a completely new value expression with its own context type, whose value becomes the value of the assignment expression.

int x = .parse(text).extensionSomething().counter = .fromEnvironment("wohoo");

I guess the x = y = e; chain is an existing pattern, and the assignments usually make sense all the way,
so it didn't look as foreign as I had expected when I first thought of it.

(For ==, a 42 == .parse(value).abs() would work, but if using my current proposal, it's still limited to expressions that are .id<selector>*, it's not a real context type so it can't do 42 == (value == null ? 0 : .parse(value).abs()), where int = value == null ? 0 : .parse(value).abs(); would work because real context types are pushed in.)

@lrhn
Copy link
Member

lrhn commented Oct 17, 2024

One underlying issue with the second option above (getters, function and visitor invocations only) is that the semantics, static and dynamic, for C.foo() with context type S is defined by cases depending on what C.foo denotes.
If C.foo is a constructor or static function, it's a constructor or static function invocation with context type S.
If it's a getter, inference continues on C.foo with empty context type, and it then does a function expression invocation of the result.

It's non-trivial to do that for .foo() in a compositional way that uses only the context type. Until we know the context type, we don't know what .foo denotes, until we know what .foo denotes, we don't know whether it has an empty context type.
(At least that's always an error, so it's possible to optimistically try the context type of .foo() as the receiver, and then, if .foo is a getter in that type, we can say that there is no solution to the implicit static receiver inference, and leave it at that.)

For the latter options, it's a matter of defining the contexts that introduce an implicit static receiver, and how it's propagated to the .id productions.
I think they're all doable, but matching the extent of null short-circuiting would avoid a similar-but-different scope.
It's probably better than just selectors.
(Even if it's not simpler.)

@leafpetersen
Copy link
Member

If C.foo is a constructor or static function, it's a constructor or static function invocation with context type S.
If it's a getter, inference continues on C.foo with empty context type, and it then does a function expression invocation of the result.

I don't understand what you mean here, and I don't think this is really correct. I mean, it's fine if you want to special case it this way, but in either case, the process you are following for C.foo(argS) is:

  1. Resolve C.foo with no context
  2. Perform downward inference if required based on the results of resolving C.foo using the context
  3. Perform inference on args using context derived from step 2)
  4. Perform upwards inference using the context + the results of 3).

@leafpetersen
Copy link
Member

Following up here as well as via email.

I'm open to shipping the full selector version, but... I really want to be sure folks are doing this with their eyes open. This is a very expensive feature, with a huge testing area, and with a very large step function in the ongoing complexity of the language. Think of all of the complexity that we run into with type inference contexts, implicit casts, etc. We're proposing, with this feature, to essentially double that surface area. Every single place in the language where a new typing context is produced, we now need to produce (and test for) a new static implicit access context being produced and propagated. Every place that is enclosed by a static implicit access context but should not have access to it needs to be tested to ensure that one is not accidentally propagated. Interactions between coercions need to be specified, implemented, and tested. Promotion and demotion need to be specified, implemented and tested. There is a huge amount of work implied in here.

All that's fine if we really think this is worth it, but I have to admit, I'm deeply unconvinced that the selector chain examples will ever see widespread use. It's just hard for me to see myself, in the middle of coding, to be starting to write something like int.parse....<long selector chain> and thinking "oh, wait, the context type is int so I can leave off the leading int. It's very finicky, works only in a limited set of places, and is very not robust to refactoring. I'm not seeing a big win here. I could be convinced otherwise, but I'd really like to see some evidence. For example, do Swift programmers use this widely for long selector chains? What would Flutter code look like with this feature? Would there be lots of build methods that got a lot shorter with this?

I'm somewhat sympathetic to the usefulness of this for constructor invocations, but again, I'd like some examples/evidence that this would actually be useful in practice. Someone suggested the example of the padding argument to the Padding class, which is usually passed as EdgeInsets.all(...) or somesuch, but I don't think that example actually works, right? Because the type of padding is EdgeInsetsGeometry. So do we have examples where this actually kicks in a common and repeatable way? I'd really like to see these before we commit to biting this off. If we do think that constructors are worth handling, I'd be interested in seeing whether we can get away with a simpler feature that's more like "implicit constructor invocations" where we just special case constructor invocations (or maybe just single invocations of any kind). I think this could be caught in the parser and handled in a way that wouldn't require threading through a new context like thing (we would obviously need to validate this).

Here are some quick examples of the kinds of things that we're going to have to specify, implement, and test if we ship the full version of this feature (just off the top of my head).

class A {
   static A getA() => A();
}

// This should work
void test(bool b) {
  A? a = (b) ? .getA() : ((b) ? .getA() : .getA());
}
class A {
   static A get getA => A();
   A call(int x) => A();
}

// This should work
void test(bool b) {
  A? a = .getA(3);
}
class A {}
class B extends A {
   static A getA() => A();
}

// This should work
void test(A a) {
  if (a is B) {
     a = .getA();
     // a = B.getA(); demotes a to A
  }
}
class A {
   static A getA() => A();
}

class B extends A {}

void test() {
  {
    A? a = A();
    a ?? .getA(); // I think this should work?
  }
  {
    A a = A();
    B  b = a ?? .getA(); // I think this shouldn't work, because the context for the RHS is B?
  }
}

@lrhn
Copy link
Member

lrhn commented Oct 17, 2024

@leafpetersen I'm not sure the complexity is as high as you're suggesting here.

If we go with just selector chains, the specification will basically be:

Grammar

<postfixExpression> ::= ...                                            ;; existing cases
    | <implicitStasticMemberAccesss> <selector>*

<implicitStasticMemberAccesss> ::= '.' (<identifier> | 'new')

plus a similar entry for constant patters.

Static semantics

Type inference on a <postfixExpression> of the form m selector* with context type scheme S, where m is an implicit-static-member-access, assigns S as the implicit static member context of the m clause (the same way static type inference assigns static types to expressions), then continues the same ways as for <primary> <selector>*.

For every place in the specification where we specify the meaning of C.id..., we have a parallel specification of .id... which behaves precisely the same as if it had been X.id... where X is the static namespace denoted by implicit static member context of the .id clause (and it's an error if there is none).

There is no propagation of contexts through recursive inference, the context type used is recognized immediately at the <postfixExpression><implicitStaticMemberAcceess> <selector>* grammar production, and stored where it belongs. (Can argue that that's cheating, and we are ignoring the structure of selector*, but that's not new.)

We are duplicating something, but it's really only the tail-inferences, the final static member invocation, where we allow both C.id and .id-with-context-C, but they mean the same thing, so it's likely that we can reuse the "what does .id with implicit static access context C denote" definition, specifying it once and for all, and then fall back on one rule for "m...wheremisC.idor.idwhich denotes a constructor *C*, and then just have one continuation of that. All the magic is in "wherem` denotes a static declaration D".

Complexity aside, we could stop at

<postfixExpression> ::= '.' <identifier> <argumentPart>? | '.' 'new' <arguments>

That grammatically reduces what we can accept, and if we're fine with .functionGetter(args) (I'm not happy about it, but I can live), then it's a start that won't get in the way of later allowing more.
(But I also don't see anything that looks like it will be easier than also allowing those two to be followed by <selector>*.)

@tatumizer
Copy link

FWIW, I think in swift, the expression .id is exactly equivalent to ContextType.id, even if it starts a chain.
We can make it more explicit by using the notation _.id instead.

I don't understand the additional complications related to nullable types like A? foo = .bar.baz. To me, .bar is unconditionally equivalent to A.bar

@leafpetersen
Copy link
Member

There is no propagation of contexts through recursive inference, the context type used is recognized immediately at the <postfixExpression><implicitStaticMemberAcceess> <selector>* grammar production, and stored where it belongs. (Can argue that that's cheating, and we are ignoring the structure of selector*, but that's not new.)

@lrhn I don't think this is correct. I think you are being misled by the syntactic nature of the grammar, which doesn't reflect the structure of the AST. In any case, you and @stereotype441 seem to be disagreeing here about whether propagation is required, so perhaps you and he should hash this out offline?

@stereotype441
Copy link
Member Author

There is no propagation of contexts through recursive inference, the context type used is recognized immediately at the <postfixExpression><implicitStaticMemberAcceess> <selector>* grammar production, and stored where it belongs. (Can argue that that's cheating, and we are ignoring the structure of selector*, but that's not new.)

@lrhn I don't think this is correct. I think you are being misled by the syntactic nature of the grammar, which doesn't reflect the structure of the AST. In any case, you and @stereotype441 seem to be disagreeing here about whether propagation is required, so perhaps you and he should hash this out offline?

I'm happy to meet with either you or @lrhn offline if it helps. But I guess I should point out that even though the grammar in the language spec currently says <postfixExpression> ::= <primary> <selector>*, neither of the existing implementations use a single AST node for this grammar production. They both use a left-branching tree with the <primary> far down to the left. Obviously we don't have to do the same thing for the new grammar rule <postfixExpression> ::= <implicitStaticMemberAcceess> <selector>*, but IMHO, we should. If we don't, then we'll wind up having two very different AST representations for selector chains, and a lot of duplicated logic for traversing those two representations.

FWIW, a similar situation arises within a cascade section (which, simplifying a bit, essentially has the form <selector>*), and we use a left-branching tree for that too, even though there's no <primary> to put at the leftmost leaf of the tree. It's slightly confusing at first, but it lets us use the same resolution logic for selector chains in cascade sections as we use for selector chains after <primary>, and that's such a huge win that it's worth it.

So yeah, I believe that somehow the context information needs to get delivered down the tree. But that doesn't necessarily mean it's going to be complex to do so. We have a lot of clever ways to move information from point A to point B, and I suspect this particular piece of information is going to be pretty simple to move around, because none of the intermediate AST nodes needs to be able to modify it.

I'm going to experiment with some possible implementation approaches and see whether I can find an approach that handles selector chains without too much complex machinery.

@leafpetersen
Copy link
Member

I did a little experimentation with an example that was listed in the other mega-thread (I think taken from the Flutter docs). The original code looks like this:

Container(
  decoration: const BoxDecoration(
    border: Border(
      top: BorderSide(color: Color(0xFFFFFFFF)),
      left: BorderSide(color: Color(0xFFFFFFFF)),
      right: BorderSide(),
      bottom: BorderSide(),
    ),
  ),
  child: Container(
    padding: const EdgeInsets.symmetric(horizontal: 20.0, vertical: 2.0),
    decoration: const BoxDecoration(
      border: Border(
        top: BorderSide(color: Color(0xFFDFDFDF)),
        left: BorderSide(color: Color(0xFFDFDFDF)),
        right: BorderSide(color: Color(0xFF7F7F7F)),
        bottom: BorderSide(color: Color(0xFF7F7F7F)),
      ),
      color: Color(0xFFBFBFBF),
    ),
    child: const Text(
      'OK',
      textAlign: TextAlign.center,
      style: TextStyle(color: Color(0xFF000000))
    ),
  ),
)

If I assume the following static extension declarations:

extension on Decoration {
  const factory Decoration box({
    Color? color,
    DecorationImage? image,
    BoxBorder? border,
    BorderRadiusGeometry? borderRadius,
    List<BoxShadow>? boxShadow,
    Gradient? gradient,
    BlendMode? backgroundBlendMode,
    BoxShape shape = BoxShape.rectangle,
  }) = BoxDecoration;
}

extension on BoxBorder {
  const factory BoxBorder border(
          {BorderSide top = BorderSide.none,
          BorderSide right = BorderSide.none,
          BorderSide bottom = BorderSide.none,
          BorderSide left = BorderSide.none}) = Border;
}

extension on EdgeInsetsGeometry {
  // What do we do here?  We need to cover 
  // {EdgeInsets, EdgeInsetsDirectional}.{all, only, symmetric}
  // with this extension if we want to get rid of the EdgeInsets prefix below.
}

then I believe, if I've done the resolution correctly in my head, that we get the following:

Container(
      decoration: const .box(
        border: .border(
          top: .new(color: .new(0xFFFFFFFF)),
          left: .new(color: .new(0xFFFFFFFF)),
          right: .new(),
          bottom: .new(),
        ),
      ),
      child: Container(
        padding: const EdgeInsets.symmetric(horizontal: 20.0, vertical: 2.0),
        decoration: const .box(
          border: .border(
            top: .new(color: .new(0xFFDFDFDF)),
            left: .new(color: .new(0xFFDFDFDF)),
            right: .new(color: .new(0xFF7F7F7F)),
            bottom: .new(color: .new(0xFF7F7F7F)),
          ),
          color: .new(0xFFBFBFBF),
        ),
        child: const Text('OK',
            textAlign: .center,
            style: .new(color: .new(0xFF000000))),
      ),
    );
  }

I didn't have any great ideas about how to name the EdgeInsetsGeometry constructors, so I left that unchanged for now. But if you pick a name (e.g. .insetsSymmetric), then the call to EdgeInsets.symmetric can use whatever name you chose.

This is definitely shorter code. Editing felt a little odd since it wasn't obvious to me when I could just type .new and when I had to specify the subclass, but maybe for someone more familiar with the classes in question this would be obvious.

@goderbauer @mit-mit thoughts on this? Is this level of brevity worth some extra cost? Is not supporting arbitrary selector chains reasonable?

@lrhn
Copy link
Member

lrhn commented Oct 18, 2024

We should maybe consider it a kind of tech-debt that our specification and implementation do not agree.
That makes it hard to reason about the actual complexity of a specification.

Let's assume we changed the grammar to:

<postfixExpression> ::= <assignableExpression> <postfixOperator>
   | <selectorExpression>

<selectorExpression> ::= <primary> | <selectorExpression> <selector>

<primary> ::= ...existing rules ...
    | <implicitStaticMemberAccess>

<implicitStaticMemberAccess> ::= '.' (<identifier> | 'new')

which I believe would better match the AST, and still only applied the implicit static member context to selector chains, not assignments/increments. (And found a way to specify null shortening recursively, if it isn't already.)

Then we would capture the context type at <postfixExpression><selectorExpression> and pass it along the recursion down <selectorExpression> until seeing the <primary>, and if that primary is an implicit static member access, use that captured context type.
It's one extra context type passed along, but only while recursing on <selectorExpression> (which currently has no context anyway, and if we get selector-chain inference, then it's exactly where that would also apply.)

(I'll try to see what it would look like if we also want to include assignments and increment operators, to match the null-shortening's extent.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enum-shorthands Issues related to the enum shorthands feature.
Projects
None yet
Development

No branches or pull requests

6 participants