Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per directory configs - preliminary changes #9550

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

0nf
Copy link

@0nf 0nf commented Apr 15, 2024

Type of Changes

Type
🔨 Refactoring

Description

Modifications in existing code base that are needed for per-directory configs. This PR does not introduce new functionality itself, but contains part of the changes from #9395.

The only new behavior from this PR is slightly modified messages in verbose mode.

Refs #618

Copy link

codecov bot commented Apr 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.79%. Comparing base (67bfab4) to head (9b3576f).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9550      +/-   ##
==========================================
- Coverage   95.81%   95.79%   -0.03%     
==========================================
  Files         173      173              
  Lines       18825    18851      +26     
==========================================
+ Hits        18038    18058      +20     
- Misses        787      793       +6     
Files Coverage Δ
pylint/config/arguments_manager.py 99.46% <100.00%> (+0.01%) ⬆️
pylint/config/config_file_parser.py 100.00% <100.00%> (ø)
pylint/config/config_initialization.py 98.91% <100.00%> (+0.02%) ⬆️
pylint/config/find_default_config_files.py 91.48% <100.00%> (+0.18%) ⬆️
pylint/lint/pylinter.py 96.56% <100.00%> (+0.12%) ⬆️

... and 1 file with indirect coverage changes

This comment has been minimized.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes it much easier to review. Left some comments!

@@ -9,3 +9,4 @@ six
# Type packages for mypy
types-pkg_resources==0.1.3
tox>=3
pre-commit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this out? pre-commit is not necessary to run the tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll move it out of this branch.
But do you have any advice - how can I get rid of formatting: failed with pre-commit is not allowed, use allowlist_externals to allow it without adding this in requirements? Is there some global config for tox where I can add pre-commit dependency or set allowlist_externals?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really use tox, didn't know we still recommend it.

You can probably just use the CI for this? That's what I always do 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved pre-commit dependency directly to the tox.ini config - does it seem like better solution?

pylint/config/config_initialization.py Show resolved Hide resolved
Comment on lines 148 to 149
if Path(".").resolve() not in linter._directory_namespaces:
linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not have the if I think? _config_initialization should be called once? Even for multi-dir?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_config_initialization has some non-trivial logic for parsing several possible variants of configs into namespace, merging it with command-line arguments, configuring plugins and reporting errors during this process. So it was convenient to reuse all this logic for parsing additional configs, and _config_initialization is going to be called for each new config in subsequent changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but shouldn't we pass the path of the current config file to this function then? On its own this if statement doesn't make a lot of sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly - the path of the current config file is passed to _config_initialization in Run.__init__, and paths of new config files will be passed there in register_local_config.

The idea of this if is that first time when _config_initialization is called, it parses config from working directory, and this config should be saved to linter._directory_namespaces to avoid additional processing of special cases. But next times when _config_initialization is called, it shouldn't overwrite config for working directory with values from new files.

pylint/config/find_default_config_files.py Outdated Show resolved Hide resolved
pylint/config/find_default_config_files.py Outdated Show resolved Hide resolved
@@ -66,7 +66,7 @@
ModuleDescriptionDict,
Options,
)
from pylint.utils import ASTWalker, FileState, LinterStats, utils
from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we need the merging of stats for the multi-dir config option?

Copy link
Author

@0nf 0nf Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current state is that some stat counters are reset during linter.open(), some are reset in linter.set_current_module -> init_single_module(), and some are not reset at all (error, warning etc in 1st group, all counters in stats.by_module in 2nd group, statement in 3rd). It leads to incorrect score calculation when linter (i.e. main checker) is opened per file, or when it is opened after getting asts.
So I decided to reset all possible counters for each new module by creating new LinterStats object in set_current_module.

If stats reset is omitted entirely, then another problem arises:
When jobs>1, the same linter object can be used for checking several modules, stats after each module are copied and then merged.
It leads to a situation when some stats are accounted several times in final result (it's checked in test_subconfigs_score in my 1st PR).

Explicit stats reset and merge in single process can be avoided, but it will require additional changes in code for parallel checks. I'd suggest to leave it as a possible optimization in another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.

@art049
Copy link

art049 commented Apr 17, 2024

Hey @0nf and @DanielNoord, I've been running the benches on this branch, and these changes seem to significantly impact the baseline benchmarks and from their definition, those seem important:

def test_baseline_benchmark_j1(self, benchmark: BenchmarkFixture) -> None:
"""Establish a baseline of pylint performance with no work.
We will add extra Checkers in other benchmarks.
Because this is so simple, if this regresses something very serious has happened
"""

However, there is a big regression on the runs:
image

Curious to know if you expected this performance change.
For some explanation, I installed CodSpeed on a fork synced with this repo. You can look at the full report here.

@DanielNoord
Copy link
Collaborator

Thanks for that @art049.

It is probably related to the moving of ast_per_fileitem = self._get_asts(fileitems, data). Does the report also point to what is making the performance slower?

@art049
Copy link

art049 commented Apr 18, 2024

@DanielNoord yes it seems from the differential profile the regressions is mainly located in PyLinter._get_namespace_for_file:
image

A lot of new code(in blue) is executed here.

@0nf
Copy link
Author

0nf commented Apr 19, 2024

Path.resolve() is not connected to the moving of self._get_asts(fileitems, data), it is just added for correct identification of parent directories, including situations where paths contain symlinks.

There is also a report based on branch with full changes for per-directory configs. _get_namespace_for_file is behind the new feature flag there, so in the end performance of test_baseline_benchmark_j1 is affected to a much lesser extent.

Aleksey Petryankin added 2 commits April 19, 2024 13:48
- Add opportunity to open checkers per-file, so they can use values from local config during opening
- Save command line arguments to apply them on top of each new config
- More accurate verbose messages about config files
- Enable finding config files in arbitrary directories
- Add opportunity to call linter._astroid_module_checker per file in single-process mode
- Collect stats from several calls of linter._astroid_module_checker in single-process mode
- Extend linter._get_namespace_for_file to return the path from which namespace was created
@0nf 0nf force-pushed the per_directory_configs_preliminary branch from 42bbb5a to 2f6c087 Compare April 19, 2024 11:48

This comment has been minimized.

- Responses to review comments
- Add test for calling _astroid_module_checker on different levels
- Move Path.resolve() out of _get_namespace_for_file recursive calls
@0nf 0nf force-pushed the per_directory_configs_preliminary branch from 2f6c087 to 9b3576f Compare April 20, 2024 08:25
@DanielNoord
Copy link
Collaborator

@jacobtylerwalls I think you have done some regression testing in the past. Can you comment on whether you see a performance regression with these changes/

Copy link
Contributor

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 9b3576f

@jacobtylerwalls jacobtylerwalls self-requested a review April 28, 2024 23:51
Comment on lines +929 to +930
config_path, namespace = self._get_namespace_for_file(
Path(filepath).resolve(), self._directory_namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked report traces the regression to the call to resolve().

I wonder if we can guard it under not path.is_absolute():

>>> from timeit import timeit
>>> timeit('p.is_absolute()', setup='from pathlib import Path; p=Path(".")')
0.1643185840221122
>>> timeit('p.resolve()', setup='from pathlib import Path; p=Path(".")')
10.929103292000946

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Seems like is_absolute() will always be false as things stand, so we probably need to look higher up the stack for a place to do some sort of conversion.

@0nf
Copy link
Author

0nf commented May 6, 2024

Hi @DanielNoord ! I haven't marked some of your review notes as resolved because I don't know if my answers to them were sufficient. Could you check if my comments actually answer your questions? 🙂

@0nf
Copy link
Author

0nf commented May 6, 2024

Also, I'm a bit confused about what to do with performance drop in test_baseline_benchmark_j1.

  • Is it critical, given that the time difference is less than 5ms in all cases, which is <1% in all test_baseline_lots_of_files* benchmarks?
  • If yes - would it be ok to hide Path.resolve() behind an analog of use-local-configs feature flag? I was thinking about a condition like len(self._directory_namespaces) > 1

@DanielNoord
Copy link
Collaborator

Just letting you know that this is on my TODO list but just haven't found the time yet.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for continuing with this @0nf

If you don't mind I could also split off some of the things I think we can easily merge into separate PRs to get them reviewed by other maintainers to make this PR a little bit more manageable.

Comment on lines +148 to +149
if len(linter._directory_namespaces) == 0:
linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd prefer to revert the changes in these two lines.

They are really rightly coupled to the final implementation of the per directory configs and the performance impact is hard to judge on its own. As far as I can see, all other changes in this PR are somewhat sensible on their own. This one isn't.

"""Iterate over the default config file names and see if they exist."""
basedir = Path(basedir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
basedir = Path(basedir)

That should not be needed.

@@ -66,7 +66,7 @@
ModuleDescriptionDict,
Options,
)
from pylint.utils import ASTWalker, FileState, LinterStats, utils
from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.

with augmented_sys_path(extra_packages_paths):
# 2) Get the AST for each FileItem
ast_per_fileitem = self._get_asts(fileitems, data)
# 3) Lint each ast
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be down below

with augmented_sys_path(extra_packages_paths):
# 2) Get the AST for each FileItem
ast_per_fileitem = self._get_asts(fileitems, data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have unintended effects from not getting these within the context manager..


def _lint_file(
self,
file: FileItem,
module: nodes.Module,
check_astroid_module: Callable[[nodes.Module], bool | None],
check_astroid_module: Callable[[nodes.Module], bool | None] | None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is really touching some of the core behaviour of this behemoth of a class so it is a bit hard to review. Sorry in advance.

Why is this now optional? I don't really like that design as it further complicates this function body. Could you explain why this is needed? And could that perhaps be a separate PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants