Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Initial RFC for sharding #112
feat: Initial RFC for sharding #112
Changes from 1 commit
fcb2ab0
87a4879
5e3c99b
2f6c897
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have to be done at the CLI level. It's not possible to have it in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier, @bradzacher mentioned:
To touch on this a bit more, there are various ways rules could be using shared / global state, as discussed in this thread. This behavior might not be encouraged or officially supported, but just be aware that sharding even a single rule could break things if the rule gathers up state while running across files. It would be good to call this possibility out, whether rules are allowed to do this generally or only through a supported API like
markVariableAsUsed
, and how sharding could impact such rules.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to provide some details about what other tools do that have sharding. How does Jest decide which files to use in a shard, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original issue made it a point to separate sharding from parallel linting.
For this RFC, we should focus on sharding, which involves running ESLint on a subset of the found files rather than the entire set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overhead is definitely a concern. As @bradzacher calls out, there could be situations where sharding unexpectedly reduces performance. And given overhead costs, sharding might only be worthwhile in codebases of a large-enough size. How would the user reasonably know when it makes sense to enable sharding? Can/should users realistically be expected to perform "careful optimization and benchmarking" on their own to decide when to use the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The significant overhead would be to figure out the sharding buckets. But for large enough projects, it would be negligible.
Sharding is an optimization feature, and it is essential for users to exercise diligence when deciding to enable it. While we can provide guidance and recommendations, it is ultimately up to the users to assess their specific use cases and evaluate whether sharding is necessary and beneficial for their codebase.
Users should consider a few factors like Codebase Size, Performance Issues and Resource Availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guidelines and recommendations we provide should include example scenarios of when it makes sense to use sharding, when it doesn't make sense, and what kind of performance improvements can be expected. For example, we could include results from real-world codebases like "a codebase of 100k lines across 1k JS files saw a 20% speedup in lint running time from 5 min to 4 min when using sharding" (completely made-up example).
In fact, I think the RFC should test and report the real-world performance impacts (likely on some popular, large, open source repos), in order to justify implementing the sharding feature in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to be too prescriptive about this. In reality, people will likely try it out and determine if there's any benefit to their current process rather than going by any guidelines we publish. As with most of ESLint, I think we can provide the capability and let people try it, use it, or discard it depending on their individual results.
I disagree. Let's not move the goalposts here. We should be able to come up with scenarios where we reasonably expect sharding to improve performance and also scenarios where we believe it will decrease performance. Building out a full prototype and running it on random projects doesn't give us useful information as most tools would need to be tuned over time to make this work in individual projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please see my earlier comment about sharding vs. parallel linting. It seems like we are combining those in this RFC, but that's not what the original issue was about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this won't affect the current analysis. That said, it can limit future kinds of analysis like multi-file linting (which I've written about here, specifically this section), where sharding could make this impossible or at best become a hindrance that would require new solutions (like opting out some rules out of the sharding strategy).
You could end up with different results when turning on/off sharding, which would be pretty bad.
That said, I don't remember whether ESLint aims to implement this one day, but it's like eslint-plugin-import does in a round-about way.