Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: make all code-scanning queries diff-informed #17846

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Oct 25, 2024

With these changes, all queries in the code-scanning suite will be diff-informed. This PR is a draft for now because I haven't found a good way to test all these changes. But before I go further, I'd like a review from @aschackmull on whether the approach taken here is what we want at all.

#17190 made all the straightforward data-flow queries diff-informed. To cover all the remaining ones, I introduced two new concepts into the data-flow library:

  1. The idea that a source or sink has selected locations, which are locations that will eventually be associated with this source/sink in the query. This was necessary because many queries select not the exact source or sink but something closely related to it.
  2. has{Source,Sink}InDiffRange, encapsulating the recurring pattern of checking whether a given data-flow configuration has sources or sinks in the diff range at all. This pattern came up a lot in queries where a secondary data-flow configuration was made diff-informed, such as a configuration for finding sinks for the main data-flow configuration. I put these new predicates on DataFlow::Global, but I don't feel too confident about that. Maybe they should be defaults on DataFlow::Configuration instead, intended to be called but not overridden. Or maybe there should be a helper module in DataFlow that deals only with this matter, like DataFlow::DiffInformed<MyConfig>::hasSinkInRange.

Another pattern that came up was where we have queries that exclude results found by another query, using the latter query as a library. I believe it's fine to make these queries diff-informed for the same reason that the QL optimiser can add context to a negation: only results in the diff range are interesting to subtract.

I found it necessary to introduce action at a distance in several places, moving knowledge about callers into callees. In the future I think some of this coupling can be removed by collapsing FooQuery.qll and Foo.ql into a single file, but for now the action at a distance is supposed to be discoverable and maintainable in one of the following ways:

  1. Code comments explaining the reverse dependencies.
  2. Assuming a convention that implementing any getASelected{Source,Sink}Location is a way to communicate that this is what the query does.
  3. Parameterised modules (like in XSS.qll).
  4. Abstract classes (like in the regex queries).

I recommend reviewing this PR one commit at a time.

Pull Request checklist

All query authors

Internal query authors only

  • Autofixes generated based on these changes are valid, only needed if this PR makes significant changes to .ql, .qll, or .qhelp files. See the documentation (internal access required).
  • Changes are validated at scale (internal access required).
  • Adding a new query? Consider also adding the query to autofix.

jbj added 16 commits October 25, 2024 13:33
This extension allows queries to be diff-informed even when the elements
they select are different from the sources and sinks found by data flow.

Most of the changes are boilerplate to use the new predicates in the
backwards-compatible data-flow wrappers for all languages.
This change makes the XSS query fully diff-informed, including the
discovery of sinks. This involved making a helper data-flow analysis
diff-informed, which required punching through some abstraction layers.

An alternative would have been to use `DataFlow::SimpleGlobal` or other
means to make the analysis faster, but I didn't want to risk removing
good taint steps such as wrapping one `OutputStream` in another.
With this change, the slowest data-flow analysis in this query is made
diff-informed with the same approach as for XSS.
This query shares implementation with several other queries about
cleartext storage, but it's the only one of them that's in the
code-scanning suite. The sharing mechanism remains the same as before,
but now each query has to override `getASelectedLocation` to become
diff-informed.

Two other data-flow configurations are used in this query, but they
can't easily be made diff-informed.
This and other queries would also benefit from making `RegexFlow`
diff-informed. That will come later.
An alternative to dynamic dispatch would have been to use parameterised
modules. I tried that but abandoned it because it led to cascading
changes and noise in too many places. The parameterisation used here has
to be propagated through three different library files, and that
propagation becomes invisible (for better or worse) when using dynamic
dispatch.
@@ -113,6 +105,77 @@
XssVulnerableWriterSourceNode() { this.asExpr() instanceof XssVulnerableWriterSource }
}

/** A nullary predicate. */

Check warning

Code scanning / CodeQL

Predicate QLDoc style. Warning

The QLDoc for a predicate without a result should start with 'Holds'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant