Refactor / performance: Track scope information while traversing #609

Declspeck · 2018-02-24T22:38:54Z

This PR makes the resolution of variable types go forwards instead of backwards. It keeps an active scope. In my opinion, this makes the code more understandable (although the number of lines increased.) Code duplication is removed between completion for variables and resolving them in DefinitionResolver.

Additionally, the scope is used to cache results of getResolvedName, since it is quite slow, at least with the current Tolerant PHP Parser. This cache is invalidated when a namespace declaration is met.

There are no tests for Scope yet, in case there are severe issues with the general approach.

Edit: This increases performance of Performance.php by ~10% on my machine.

codecov · 2018-02-24T23:02:41Z

Codecov Report

Merging #609 into master will increase coverage by 0.38%.
The diff coverage is 88.31%.

@@             Coverage Diff              @@
##             master     #609      +/-   ##
============================================
+ Coverage     81.63%   82.02%   +0.38%     
+ Complexity      910      882      -28     
============================================
  Files            61       65       +4     
  Lines          2075     2097      +22     
============================================
+ Hits           1694     1720      +26     
+ Misses          381      377       -4

Impacted Files	Coverage Δ	Complexity Δ
src/SignatureInformationFactory.php	`100% <100%> (ø)`	`10 <6> (ø)`	⬇️
src/SignatureHelpProvider.php	`98.66% <100%> (ø)`	`26 <0> (ø)`	⬇️
src/Scope/GetScopeAtNode.php	`100% <100%> (ø)`	`0 <0> (?)`
src/Scope/Variable.php	`100% <100%> (ø)`	`1 <1> (?)`
src/Scope/TreeTraverser.php	`100% <100%> (ø)`	`37 <37> (?)`
src/Scope/Scope.php	`100% <100%> (ø)`	`4 <4> (?)`
src/CompletionProvider.php	`98.4% <100%> (+3.88%)`	`75 <0> (-33)`	⬇️
src/Server/TextDocument.php	`75.37% <75%> (ø)`	`56 <0> (ø)`	⬇️
src/DefinitionResolver.php	`86.11% <76.97%> (-1.22%)`	`299 <275> (-31)`
src/TreeAnalyzer.php	`92.85% <96.66%> (-1.43%)`	`47 <38> (-6)`
... and 5 more

jens1o · 2018-02-25T07:15:15Z

src/Scope/TreeTraverser.php

+        }
+
+        if ($node instanceof ClassLike
+            && (in_array($childName, ['classMembers', 'interfaceMembers','traitMembers'], true))


Do we need to strictly compare here?

Not really - I thought it would be good form, just like using === instead of == - do you think there's a reason to change it?

Somehow my benchmarks are showing different results:
https://gist.github.com/jens1o/621c1307ee0b9d618839211688d46dba

PS C:\nginx\html\projects\php7.2-playground> php .\in_array_strict_benchmark.php Took 0.42s with strict compare Took 0.35s without strict compare Took 5.47s with strict compare Took 0.43s without strict compare Took 0.54s with strict compare Took 0.47s without strict compare

Your benchmark shows searching for an empty string which does not is the case here?

@nikic is it expexcted that strictly searching for any empty string via in_array is slower then when doing it non-strict?

That said, the in_array strict path is slightly less optimized than the non-strict one. Though if you want to micro-optimize this, the way to do it is to use \in_array (with strict), which will transform to an HT lookup. Or even better, just directly implement it as an HT lookup so you're not dependent on optimization behavior.

Ah, I didn't actually mean performance-wise but style-wise.

I think it's better because the code does not perform convoluted type casts under the hood. It's not that the code would be more incorrect that way, just that it feels that way - when I'm searching for a bug, I don't have to wonder if this comparison here does something funny.

... although I would have expected the strict version also to be faster.

Well, that specific benchmark is rather useless, as the second array contains 0 as first element, which is (non-strict) equal to the empty string, so what you see there is the difference between matching on the first element in the array vs scanning through all 10000 elements and not finding anything.

Good point!

I agree that strict comparison is always easier to reason about.

jens1o · 2018-02-25T07:15:49Z

src/Scope/TreeTraverser.php

+
+
+        // TODO: Handle use (&$x) when $x is not defined in scope.
+        // TODO: Handle list(...) = $a;


and PHP7.1+ [...] = $a;

👍 I didn't even know about that construct

Ok, I added a TODO comment for that and foreach ($a as [...])

felixfbecker · 2018-02-28T19:24:37Z

src/DefinitionResolver.php

@@ -716,6 +603,7 @@ public function resolveExpressionNodeToType($expr)
            if ($token === PhpParser\TokenKind::NullReservedWord) {
                return new Types\Null_;
            }
+            return new Types\Mixed_;


Do you expect this case to ever get hit? Or is it for future-proofing?

These were added accidentally (same goes the Types\Mixed_ below) - I tried various things in the same branch if they improved performance, including bailing early here. It did not really affect performance, but I apparently forgot to revert some changes. I can revert this.

felixfbecker · 2018-02-28T19:35:55Z

src/Scope/GetScopeAtNode.php

+    $traverser = new TreeTraverser($definitionResolver);
+    $resultScope = null;
+    $traverser->traverse(
+        $sourceFile,


I don't really understand why you would want to traverse the whole source file from the root if you already have the node and every node is linked to its parent. This seems very inefficient, as you will visit a lot of nodes that you don't care about.

Intuitively, a scope is anything enclosed in by the closest function node, so only that should the nodes inside that boundary should be visited for any needed operation.

Perhaps Scope is a misnomer, since it also contains information about other names in effect - in particular the resolved name cache depends on the current namespace. Maybe ParsingContext could be more appropriate.

However, after reading your comment, I now realize that scanning the whole file is not necessary, since the code is not interested in what the current namespace is, only when it changes.

Also, the scope contains $this variable and $currentSelf, these depend on the class. These could be handled separately and start the scanning from the function like you said. It might be more efficient on large files.

felixfbecker · 2018-02-28T19:38:16Z

src/Scope/Scope.php

+class Scope
+{
+    /**
+     * @var Variable|null "Variable" representing this/self


...or null if the scope is not inside a class

felixfbecker · 2018-02-28T19:39:09Z

src/Scope/Scope.php

+use Microsoft\PhpParser\Node\QualifiedName;
+
+/**
+ * Contains information about variables at a point.


Could you elaborate what "point" means here? How is the scope boundary defined?

Scope when evaluating the expression at the node being traversed, if it were an expression.

... perhaps? I'm not sure how to express it clearly.

felixfbecker · 2018-02-28T19:40:01Z

src/Scope/Scope.php

+    public function getResolvedName(QualifiedName $name)
+    {
+        $nameStr = (string)$name;
+        if (array_key_exists($nameStr, $this->resolvedNameCache)) {


I am assuming there are no null values in the array so isset would be better

True, I'll fix it. I just copy-pasted code from the DocBlock thingie.

felixfbecker · 2018-02-28T19:49:02Z

src/Scope/TreeTraverser.php

+/**
+ * Traversers AST with Scope information.
+ */
+class TreeTraverser


The parser already exposes utilities to traverse the tree (and uses the iterator pattern instead of the visitor pattern, which is superior imo). Given that this is a fair amount of code, could you explain why this is needed? What is the difference between this scope-tree traverser and a general traverser?

I need to look into the parser code and see if we could use it - my initial guess is that we can't though, since the traversing needs to happen in the following fashion:

Scope = [] // Scope 1 - Root Scope = [$this] // Scope 2 - Class entered Scope = [$this, $a] // Scope 3 - Function entered Scope = [$this] // !! Scope 2 - Same scope again Scope = [] // !! Scope 1 - Same scope again

The lines marked with !! - if we only get a callback per Node or Token, we cannot know when a scope is exited. Additionally, we'd need to manage a stack manually instead of relying on the call stack.

Further, we need information on the context of the element - if we have a BracedStatementList (or whatever it's called), we need to know if it is a functionBodyOrSemicolon to start a new Scope, so the Node alone is not sufficient.

If the parser does not provide a traverser that fits those two needs, I think that working around them might make the code more confusing.

I looked into this, and getDescendantNodesAndTokens or getDescendantNodes cannot really be used, since we won't know when a node is exited and the scope should be restored.

I think that TreeTraverser could be converted into a generator yielding something - e.g. rename Scope to TraversingContext which contains both $node and $variables etc. Do you think that's worth pursuing?

(I accidentally posted this with a wrong GitHub account first - sorry about the noise)

So first of all I still think parent traversal is the better way to go.
And then getDescendentNodes afaik takes a callback to decide whether a node should be entered or not.
For traversal that needs full control over the recursive aspect I did a PR a while ago to tolerant-php-parser with RecursiveIterator support: microsoft/tolerant-php-parser#139
I didn't need it at the time though and it turned out it was a tad slower than the generators. It would be interesting to see how it compares to this TreeTraverser and the previous implementation, especially on latest PHP 7.2.

The RecursiveIterator thing certainly looks interesting. It would require manual handling of a stack though, if I understand it correctly.

I think that traversing the code backwards to find types has a few problems, which I tried to solve with this:

1. Duplicated work:

Find references on line 2:

1 $a = new A; // 2. Ok, `$a is A` 2 $b = $a->foo(); // 1. Start traversing - need to find what $a is 3 $b->bar();

After that, find references on line 3:

1 $a = new A; // 3. Duplicated work - Ok, `$a is A` 2 $b = $a->foo(); // 2. Duplicated work - Need to find out what $a is 3 $b->bar(); // 1. Start Traversing - need to find what $b is to get reference to bar

This could of course be solved with some sort of memoization. It would probably end up building a scope, but in reverse order.

2. Slowness of parsing backwards in Tolerant PHP Parser

Getting a previous sibling gets its parent, enumerates its children, and when it finds the original node, it returns the previously met one.

(3. Harder to understand - this is mostly a matter of preference maybe)

Since code is evaluated from top to bottom, an approach evaluating code from top to bottom is easier to understand, at least for me.

I'm going to sleep now, so I won't reply for a while. I might take a look at the other PRs and comments tomorrow.

Duplicated work:

I don't think this problem has anything to do with how to traverse. The difference is that this PR introduces a Scope object, where before this was always computed JIT. I never had any performance problems with find-refs on variables though because their scope is naturally small in PHP, so it's a CPU vs memory usage trade off.

Slowness of parsing backwards in Tolerant PHP Parser
Getting a previous sibling gets its parent, enumerates its children, and when it finds the original node, it returns the previously met one.

This is a good point. With PHPParser I added the siblings as properties so that was fast. We could do the same for tolerant-parser.

Are you maintaining the same behaviour as before with the new traverser?

I believe before, I would always look for the closest assignment to the variable, in case it got overridden:

1 $a = 123; 2 $a = 'abc'; 3 $a // should jump to L2

For that I find it very natural to search backwards and confusing to search top-down. If you intend to get to L1, then I would agree that top-down is more natural, but perf-wise (leaving aside 2.) the assumption is that variables are defined close to the reference.

I never had any performance problems with find-refs on variables though because their scope is naturally small in PHP, so it's a CPU vs memory usage trade off.

It might be that the performance benefits here come from caching getResolvedName results in the Scope object. I started working on this because XDebug profiling showed finding variable types as a significant cost. XDebug has very high overhead though, so it might be completely inaccurate. It might be worth trying to cache the getResolvedName results alone, especially if you don't like the Scope stuff.

Are you maintaining the same behaviour as before with the new traverser?

I believe so - the variables in Scope are overwritten when a new assignment is met, so on line 3 $a would be a string.

felixfbecker · 2018-02-28T19:49:56Z

src/Scope/TreeTraverser.php

+        }
+
+        if ($node instanceof ClassLike
+            && (in_array($childName, ['classMembers', 'interfaceMembers','traitMembers'], true))


I agree that strict comparison is always easier to reason about.

felixfbecker · 2018-02-28T19:52:35Z

src/DefinitionResolver.php

@@ -175,10 +177,15 @@ private function getDocBlock(Node $node)
     *
     * @param Node $node
     * @param string $fqn
+     * @param Scope|null $scope Scope at the point of Node. If not provided, will be computed from $node.


How do you ensure that the scope passed in is the scope of the node?

I have to think about this a bit more.

One thing that comes to mind is that the Scope could contain information about the Node or FunctionLike it is defined at? The added check would have a performance penalty (which I guess might be significant), since you'd need to find the closest FunctionLike, ClassLike, or SourceFileNode ancestor every time you wanted to check if the scope belongs to a node.

Declspeck · 2018-03-01T08:26:02Z

src/DefinitionResolver.php

@@ -726,6 +614,7 @@ public function resolveExpressionNodeToType($expr)
            if ($def !== null) {
                return $def->type;
            }
+            return new Types\Mixed_;


@felixfbecker This is not related to this PR, I left it here by accident. Should I revert it or let it stay?

Declspeck added 6 commits February 24, 2018 18:25

feat(parsing): Keep track of scope when parsing

d78b99b

refactor(completion): use scope to suggest local variables

7dd0f10

refactor(scope): remove special-case handling of $this

a8829a9

refactor(scope): rename currentClassLikeVariable to currentSelf

7734579

PHP 7.0 compatibility

29fd70a

fix(style): fix phpcs errors

3a2bba7

fix(scope): reset on namespace declaration

f4db997

jens1o reviewed Feb 25, 2018

View reviewed changes

documentation(scope): add more todo comments

f14478f

felixfbecker reviewed Feb 28, 2018

View reviewed changes

Declspeck commented Mar 1, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor / performance: Track scope information while traversing #609

Refactor / performance: Track scope information while traversing #609

Declspeck commented Feb 24, 2018 •

edited

Loading

codecov bot commented Feb 24, 2018 •

edited by ghost

Loading

jens1o Feb 25, 2018

Declspeck Feb 25, 2018

jens1o Feb 25, 2018 •

edited

Loading

staabm Feb 25, 2018

staabm Feb 25, 2018

nikic Feb 25, 2018

Declspeck Feb 25, 2018

Declspeck Feb 25, 2018

jens1o Feb 25, 2018

felixfbecker Feb 28, 2018

jens1o Feb 25, 2018 •

edited

Loading

Declspeck Feb 25, 2018

Declspeck Feb 25, 2018

felixfbecker Feb 28, 2018

Declspeck Mar 1, 2018

felixfbecker Feb 28, 2018 •

edited

Loading

Declspeck Mar 1, 2018

felixfbecker Feb 28, 2018

felixfbecker Feb 28, 2018

Declspeck Mar 1, 2018

felixfbecker Feb 28, 2018

Declspeck Mar 1, 2018

felixfbecker Feb 28, 2018

Declspeck Mar 1, 2018

Declspeck Mar 1, 2018

felixfbecker Mar 1, 2018

Declspeck Mar 1, 2018

felixfbecker Mar 1, 2018 •

edited

Loading

Declspeck Mar 10, 2018

felixfbecker Feb 28, 2018

felixfbecker Feb 28, 2018

Declspeck Mar 1, 2018

Declspeck Mar 1, 2018



		// TODO: Handle use (&$x) when $x is not defined in scope.
		// TODO: Handle list(...) = $a;

Refactor / performance: Track scope information while traversing #609

Are you sure you want to change the base?

Refactor / performance: Track scope information while traversing #609

Conversation

Declspeck commented Feb 24, 2018 • edited Loading

codecov bot commented Feb 24, 2018 • edited by ghost Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jens1o Feb 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jens1o Feb 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixfbecker Feb 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1. Duplicated work:

2. Slowness of parsing backwards in Tolerant PHP Parser

(3. Harder to understand - this is mostly a matter of preference maybe)

felixfbecker Mar 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Declspeck commented Feb 24, 2018 •

edited

Loading

codecov bot commented Feb 24, 2018 •

edited by ghost

Loading

jens1o Feb 25, 2018 •

edited

Loading

jens1o Feb 25, 2018 •

edited

Loading

felixfbecker Feb 28, 2018 •

edited

Loading

felixfbecker Mar 1, 2018 •

edited

Loading