-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: make all code-scanning queries diff-informed #17846
Draft
jbj
wants to merge
16
commits into
github:main
Choose a base branch
from
jbj:diff-informed-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This extension allows queries to be diff-informed even when the elements they select are different from the sources and sinks found by data flow. Most of the changes are boilerplate to use the new predicates in the backwards-compatible data-flow wrappers for all languages.
This change makes the XSS query fully diff-informed, including the discovery of sinks. This involved making a helper data-flow analysis diff-informed, which required punching through some abstraction layers. An alternative would have been to use `DataFlow::SimpleGlobal` or other means to make the analysis faster, but I didn't want to risk removing good taint steps such as wrapping one `OutputStream` in another.
With this change, the slowest data-flow analysis in this query is made diff-informed with the same approach as for XSS.
This query shares implementation with several other queries about cleartext storage, but it's the only one of them that's in the code-scanning suite. The sharing mechanism remains the same as before, but now each query has to override `getASelectedLocation` to become diff-informed. Two other data-flow configurations are used in this query, but they can't easily be made diff-informed.
This and other queries would also benefit from making `RegexFlow` diff-informed. That will come later.
An alternative to dynamic dispatch would have been to use parameterised modules. I tried that but abandoned it because it led to cascading changes and noise in too many places. The parameterisation used here has to be propagated through three different library files, and that propagation becomes invisible (for better or worse) when using dynamic dispatch.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C#
C++
DataFlow Library
Go
Java
no-change-note-required
This PR does not need a change note
Python
Ruby
Swift
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With these changes, all queries in the code-scanning suite will be diff-informed. This PR is a draft for now because I haven't found a good way to test all these changes. But before I go further, I'd like a review from @aschackmull on whether the approach taken here is what we want at all.
#17190 made all the straightforward data-flow queries diff-informed. To cover all the remaining ones, I introduced two new concepts into the data-flow library:
has{Source,Sink}InDiffRange
, encapsulating the recurring pattern of checking whether a given data-flow configuration has sources or sinks in the diff range at all. This pattern came up a lot in queries where a secondary data-flow configuration was made diff-informed, such as a configuration for finding sinks for the main data-flow configuration. I put these new predicates onDataFlow::Global
, but I don't feel too confident about that. Maybe they should be defaults onDataFlow::Configuration
instead, intended to be called but not overridden. Or maybe there should be a helper module inDataFlow
that deals only with this matter, likeDataFlow::DiffInformed<MyConfig>::hasSinkInRange
.Another pattern that came up was where we have queries that exclude results found by another query, using the latter query as a library. I believe it's fine to make these queries diff-informed for the same reason that the QL optimiser can add context to a negation: only results in the diff range are interesting to subtract.
I found it necessary to introduce action at a distance in several places, moving knowledge about callers into callees. In the future I think some of this coupling can be removed by collapsing
FooQuery.qll
andFoo.ql
into a single file, but for now the action at a distance is supposed to be discoverable and maintainable in one of the following ways:getASelected{Source,Sink}Location
is a way to communicate that this is what the query does.XSS.qll
).I recommend reviewing this PR one commit at a time.
Pull Request checklist
All query authors
.qhelp
. See the documentation in this repository.Internal query authors only
.ql
,.qll
, or.qhelp
files. See the documentation (internal access required).