Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: stdlib models qa #16843

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Python: stdlib models qa #16843

wants to merge 17 commits into from

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Jun 26, 2024

Branch and draft PR to make QA experiments for #16840.

The only addition is that we do not extract the standard library by default.

yoff added 12 commits June 25, 2024 14:13
- empty models for now
- `summaryModel` of `codeql/python-all` will be added to shortly.
- `quote` together with `re.compile` recover regex injection alerts on haiwen/seahub
- `quote_plus` recovers the URL redirection alert on DemocracyClub/EveryElection
- `unquote` recovers path injection alerts on `cloudera/hue`
- it was tedious finding justifications for the rest..
There is already a model there so we add to that one.

We did observe that this existing model was blocked by the external MaD model.
This is concerning and needs to be cleared up.
Two of the generated summaries have been excluded:
 - ["re", "Member[split]", "Argument[0,pattern:]", "ReturnValue", "taint"]
   From the documentation, it is not clear why pattern should figure in the return value, as that is the part denoting split point and thus all those instances are filtered out.
   From the implementation
     Spit function: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L199
     _compile function being called by split: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280
   We see that in case the pattern is already a compiled `Pattern`, it is returned directly from _compile and could thus be part of the return value from split. This is probably not possible to arrange for an attacker, and so an FP in practice.

 - ["urllib2", "Member[unquote]", "Argument[0,string:]", "ReturnValue", "taint"]
   urllib2 seems to be only in Python2 (e.g. https://docs.python.org/2.7/library/urllib2.html) and I cannot locate the function unquote.
@yoff yoff added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Jun 26, 2024
@yoff
Copy link
Contributor Author

yoff commented Jun 26, 2024

Marking this as ready for review as this seems necessary for QA. No need to review at this point, though :-)
Edit: No, that was a misunderstanding.

@yoff yoff marked this pull request as ready for review June 26, 2024 11:34
@yoff yoff requested a review from a team as a code owner June 26, 2024 11:34
@yoff yoff marked this pull request as draft June 26, 2024 11:35
@yoff yoff added the no-change-note-required This PR does not need a change note label Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant