Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Modelling of the Standard Library #16840

Merged
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,6 @@ private module Cached {
or
containerStep(nodeFrom, nodeTo)
or
copyStep(nodeFrom, nodeTo)
or
DataFlowPrivate::forReadStep(nodeFrom, _, nodeTo)
or
DataFlowPrivate::iterableUnpackingReadStep(nodeFrom, _, nodeTo)
Expand Down Expand Up @@ -191,18 +189,6 @@ predicate containerStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
DataFlowPrivate::comprehensionStoreStep(nodeFrom, _, nodeTo)
}

/**
* Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related to copying.
*/
predicate copyStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
exists(DataFlow::CallCfgNode call | call = nodeTo |
call = API::moduleImport("copy").getMember(["copy", "deepcopy"]).getACall() and
call.getArg(0) = nodeFrom
)
or
nodeTo.(DataFlow::MethodCallNode).calls(nodeFrom, "copy")
}

/**
* Holds if taint can flow from `nodeFrom` to `nodeTo` with an `await`-step,
* such that the whole expression `await x` is tainted if `x` is tainted.
Expand Down
14 changes: 14 additions & 0 deletions python/ql/lib/semmle/python/frameworks/Stdlib.qll
Original file line number Diff line number Diff line change
Expand Up @@ -254,10 +254,14 @@ module Stdlib {
* See https://docs.python.org/3.9/library/logging.html#logging.Logger.
*/
module Logger {
private import semmle.python.dataflow.new.internal.DataFlowDispatch as DD

/** Gets a reference to the `logging.Logger` class or any subclass. */
API::Node subclassRef() {
result = API::moduleImport("logging").getMember("Logger").getASubclass*()
or
result = API::moduleImport("logging").getMember("getLoggerClass").getReturn().getASubclass*()
or
result = ModelOutput::getATypeNode("logging.Logger~Subclass").getASubclass*()
}

Expand All @@ -277,6 +281,13 @@ module Stdlib {
ClassInstantiation() {
this = subclassRef().getACall()
or
this =
DD::selfTracker(subclassRef()
.getAValueReachableFromSource()
.asExpr()
.(ClassExpr)
.getInnerScope())
or
Comment on lines +284 to +290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a test indicating we can now do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point...this is one bit I am somewhat unsure of, do we want this? Also, it should trigger a performance check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added this in 77a0087 -- I think a DCA performance check for this whole PR is in place yes 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set off a standard nightly/nightly DCA run. I did not include any extra options, so it should still extract the standard library; thus we should see a slight slowdown from the new modelling and we can gauge that it is not huge. (If we did stop extraction, that speedup would drown out what we are looking for.)

this = API::moduleImport("logging").getMember("root").asSource()
or
this = API::moduleImport("logging").getMember("getLogger").getACall()
Expand Down Expand Up @@ -1492,6 +1503,9 @@ module StdlibPrivate {
or
// io.open is a special case, since it is an alias for the builtin `open`
result = API::moduleImport("io").getMember("open")
or
// similarly, coecs.open calls the builtin `open`: https://github.com/python/cpython/blob/3.12/Lib/codecs.py#L918
result = API::moduleImport("codecs").getMember("open")
}

/**
Expand Down
112 changes: 111 additions & 1 deletion python/ql/lib/semmle/python/frameworks/Stdlib/StdLib.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,107 @@ extensions:
- addsTo:
pack: codeql/python-all
extensible: sinkModel
data: []
data:
- ["subprocess.Popen!","Subclass.Call.Argument[0,args:]", "log-injection"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
- ["zipfile.ZipFile","Member[extractall].Argument[0,path:]", "path-injection"]

- addsTo:
pack: codeql/python-all
extensible: summaryModel
data:
# See
# - https://docs.python.org/3/glossary.html#term-mapping
# - https://docs.python.org/3/library/stdtypes.html#dict.get
- ["_collections_abc.Mapping", "Member[get]", "Argument[1,default:]", "ReturnValue", "taint"]
yoff marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser
- ["argparse.ArgumentParser", "Member[_parse_known_args,_read_args_from_files]", "Argument[0,arg_strings:]", "ReturnValue", "taint"]
- ["argparse.ArgumentParser", "Member[parse_args,parse_known_args]", "Argument[0,args:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/cgi.html#higher-level-interface
- ["cgi.FieldStorage", "Member[getfirst,getlist,getvalue]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack
- ["contextlib.ExitStack", "Member[enter_context]", "Argument[0,cm:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/copy.html#copy.deepcopy
- ["copy", "Member[copy,deepcopy]", "Argument[0,x:]", "ReturnValue", "value"]
# See
# - https://docs.python.org/3/library/ctypes.html#ctypes.create_string_buffer
# - https://docs.python.org/3/library/ctypes.html#ctypes.create_unicode_buffer
- ["ctypes", "Member[create_string_buffer,create_unicode_buffer]", "Argument[0,init:,init_or_size:]", "ReturnValue", "taint"]
# See https://docs.python.org/3.11/distutils/apiref.html#distutils.util.change_root
- ["distutils", "Member[util].Member[change_root]", "Argument[0,new_root:,1,pathname:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/email.header.html#email.header.Header
- ["email.header.Header!", "Subclass.Call", "Argument[0,s:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr
- ["email", "Member[utils].Member[parseaddr]", "Argument[0,addr:]", "ReturnValue", "taint"]
- ["email", "Member[utils].Member[parseaddr]", "Argument[0,addr:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See See https://docs.python.org/3/library/fnmatch.html#fnmatch.filter
- ["fnmatch", "Member[filter]", "Argument[0,names:].ListElement", "ReturnValue.ListElement", "value"]
- ["fnmatch", "Member[filter]", "Argument[0,names:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/getopt.html#getopt.getopt
- ["getopt", "Member[getopt]", "Argument[0,args:]", "ReturnValue.TupleElement[1]", "taint"]
- ["getopt", "Member[getopt]", "Argument[1,shortopts:,2,longopts:]", "ReturnValue.TupleElement[0].ListElement.TupleElement[0]", "taint"]
# See https://docs.python.org/3/library/gettext.html#gettext.gettext
- ["gettext", "Member[gettext]", "Argument[0,message:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/gzip.html#gzip.GzipFile
- ["gzip.GzipFile!", "Subclass.Call", "Argument[0,filename:]", "ReturnValue", "taint"]
# See
# - https://docs.python.org/3/library/html.html#html.escape
# - https://docs.python.org/3/library/html.html#html.unescape
- ["html", "Member[escape,unescape]", "Argument[0,s:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/html.parser.html#html.parser.HTMLParser.feed
- ["html.parser.HTMLParser", "Member[feed]", "Argument[0,data:]", "Argument[self]", "taint"]
# See https://docs.python.org/3.11/library/imp.html#imp.find_module
- ["imp", "Member[find_module]", "Argument[0,name:,1,path:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/logging.html#logging.getLevelName
# specifically the no matching case
- ["logging", "Member[getLevelName]", "Argument[0,level:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/logging.html#logging.LogRecord.getMessage
- ["logging.LogRecord", "Member[getMessage]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type
- ["mimetypes", "Member[guess_type]", "Argument[0,url:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/multiprocessing.html#multiprocessing.connection.Listener
- ["multiprocessing.connection.Listener!", "Subclass.Call", "Argument[3,authkey:]", "ReturnValue", "taint"]
yoff marked this conversation as resolved.
Show resolved Hide resolved
# See https://github.com/python/cpython/blob/main/Lib/nturl2path.py
# No user-facing documentation, unfortunately.
- ["nturl2path", "Member[pathname2url]", "Argument[0,p:]", "ReturnValue", "taint"]
- ["nturl2path", "Member[url2pathname]", "Argument[0,url:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/optparse.html#optparse.OptionParser.parse_args
- ["optparse.OptionParser", "Member[parse_args]", "Argument[0,args:,1,values:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See https://github.com/python/cpython/blob/3.10/Lib/pathlib.py#L972-L973
- ["pathlib.Path", ".Member[__enter__]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/os.html#os.PathLike.__fspath__
- ["pathlib.PurePath", "Member[__fspath__]", "Argument[self]", "ReturnValue", "taint"]
# See
# - https://docs.python.org/3/library/asyncio-queue.html#asyncio.Queue.put
# - https://docs.python.org/3/library/asyncio-queue.html#asyncio.Queue.put_nowait
- ["queue.Queue", "Member[put,put_nowait]", "Argument[0,item:]", "Argument[self]", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See
# - https://docs.python.org/3/library/random.html#random.choice
# - https://docs.python.org/3/library/random.html#module-random
- ["random", "Member[choice]", "Argument[0,seq:]", "ReturnValue", "taint"]
- ["random.Random", "Member[choice]", "Argument[0,seq:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/shlex.html#shlex.quote
- ["shlex", "Member[quote]", "Argument[0,s:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/shutil.html#shutil.rmtree
- ["shutil", "Member[rmtree]", "Argument[0,path:]", "Argument[2,onerror:,onexc:].Argument[1]", "taint"]
yoff marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/shutil.html#shutil.which
- ["shutil", "Member[which]", "Argument[0,cmd:,2,path:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/subprocess.html#subprocess.Popen
- ["subprocess.Popen!", "Subclass.Call", "Argument[0,args:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See
# - https://docs.python.org/3/library/tarfile.html#tarfile.open
# - https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.open
- ["tarfile", "Member[open]", "Argument[0,name:,2,fileobj:]", "ReturnValue", "taint"]
- ["tarfile.TarFile", "Member[open]", "Argument[0,name:,2,fileobj:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp
- ["tempfile", "Member[mkdtemp]", "Argument[0,suffix:,1,prefix:,2,dir:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/tempfile.html#tempfile.mkstemp
- ["tempfile", "Member[mkstemp]", "Argument[0,suffix:,1,prefix:,2,dir:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See https://docs.python.org/3/library/textwrap.html#textwrap.dedent
- ["textwrap", "Member[dedent]", "Argument[0,text:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/traceback.html#traceback.StackSummary.from_list
- ["traceback.StackSummary", "Member[from_list]", "Argument[0,a_list:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/typing.html#typing.cast
- ["typing", "Member[cast]", "Argument[1,val:]", "ReturnValue", "value"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
- ["urllib", "Member[parse].Member[quote]", "Argument[0,string:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus
Expand All @@ -28,6 +123,21 @@ extensions:
- ["urllib", "Member[parse].Member[urlencode]", "Argument[0,query:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin
- ["urllib", "Member[parse].Member[urljoin]", "Argument[0,base:,1,url:]", "ReturnValue", "taint"]
# See the internal documentation
# https://github.com/python/cpython/blob/3.12/Lib/zipfile/_path/__init__.py#L103-L105
- ["zipfile.CompleteDirs", "Member[namelist]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile
# it may be necessary to read the code to understand the taint propagation
# Constructor: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1266
- ["zipfile.ZipFile!", "Subclass.Call", "Argument[0,file:]", "ReturnValue", "taint"]
- ["zipfile.ZipFile!", "Subclass.Call", "Argument[0,file:]", "ReturnValue.Attribute[filelist].ListElement.Attribute[filename]", "value"]
# _extract_member: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1761
- ["zipfile.ZipFile", "Member[_extract_member]", "Argument[1,targetpath:]", "ReturnValue", "taint"]
# infolist: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1498-L1501
- ["zipfile.ZipFile", "Member[infolist]", "Argument[self]", "ReturnValue", "taint"]
- ["zipfile.ZipFile", "Member[infolist]", "Argument[self].Attribute[filelist]", "ReturnValue", "value"]
# namelist: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1494-L1496
- ["zipfile.ZipFile", "Member[namelist]", "Argument[self]", "ReturnValue", "taint"]
- addsTo:
pack: codeql/python-all
extensible: neutralModel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ module UnsafeShellCommandConstructionConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node sink) { sink instanceof Sink }

predicate isBarrier(DataFlow::Node node) {
node instanceof Sanitizer or
node instanceof CommandInjection::Sanitizer // using all sanitizers from `py/command-injection`
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

private import python
import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.TaintTrackingPrivate as TTP
private import semmle.python.ApiGraphs

/**
* Provides a data-flow configuration for detecting modifications of a parameters default value.
Expand Down Expand Up @@ -73,7 +73,13 @@ module ModificationOfParameterWithDefault {
or
// the target of a copy step is (presumably) a different object, and hence modifications of
// this object no longer matter for the purposes of this query.
TTP::copyStep(_, node) and state in [true, false]
copyTarget(node) and state in [true, false]
}

private predicate copyTarget(DataFlow::Node node) {
node = API::moduleImport("copy").getMember(["copy", "deepcopy"]).getACall()
or
node.(DataFlow::MethodCallNode).calls(_, "copy")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ edges
| UnsafeUnpack.py:161:19:161:21 | ControlFlowNode for tar | UnsafeUnpack.py:163:33:163:35 | ControlFlowNode for tar | provenance | |
| UnsafeUnpack.py:161:25:161:46 | ControlFlowNode for Attribute() | UnsafeUnpack.py:161:19:161:21 | ControlFlowNode for tar | provenance | |
| UnsafeUnpack.py:161:38:161:45 | ControlFlowNode for savepath | UnsafeUnpack.py:161:25:161:46 | ControlFlowNode for Attribute() | provenance | Config |
| UnsafeUnpack.py:161:38:161:45 | ControlFlowNode for savepath | UnsafeUnpack.py:161:25:161:46 | ControlFlowNode for Attribute() | provenance | MaD:49 |
| UnsafeUnpack.py:163:23:163:28 | ControlFlowNode for member | UnsafeUnpack.py:166:37:166:42 | ControlFlowNode for member | provenance | |
| UnsafeUnpack.py:163:33:163:35 | ControlFlowNode for tar | UnsafeUnpack.py:163:23:163:28 | ControlFlowNode for member | provenance | |
| UnsafeUnpack.py:166:23:166:28 | [post] ControlFlowNode for result | UnsafeUnpack.py:167:67:167:72 | ControlFlowNode for result | provenance | |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
edges
| test.py:10:16:10:24 | ControlFlowNode for file_path | test.py:11:21:11:29 | ControlFlowNode for file_path | provenance | |
| test.py:11:5:11:35 | ControlFlowNode for Attribute() | test.py:11:5:11:52 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:11:21:11:29 | ControlFlowNode for file_path | test.py:11:5:11:35 | ControlFlowNode for Attribute() | provenance | MaD:64 |
| test.py:11:21:11:29 | ControlFlowNode for file_path | test.py:11:5:11:52 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:11:21:11:29 | ControlFlowNode for file_path | test.py:12:21:12:29 | ControlFlowNode for file_path | provenance | |
| test.py:12:5:12:35 | ControlFlowNode for Attribute() | test.py:12:5:12:48 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:12:21:12:29 | ControlFlowNode for file_path | test.py:12:5:12:35 | ControlFlowNode for Attribute() | provenance | MaD:64 |
| test.py:12:21:12:29 | ControlFlowNode for file_path | test.py:12:5:12:48 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:12:21:12:29 | ControlFlowNode for file_path | test.py:14:26:14:34 | ControlFlowNode for file_path | provenance | |
| test.py:14:10:14:35 | ControlFlowNode for Attribute() | test.py:15:14:15:29 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:14:26:14:34 | ControlFlowNode for file_path | test.py:14:10:14:35 | ControlFlowNode for Attribute() | provenance | MaD:64 |
| test.py:14:26:14:34 | ControlFlowNode for file_path | test.py:15:14:15:29 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:14:26:14:34 | ControlFlowNode for file_path | test.py:18:26:18:34 | ControlFlowNode for file_path | provenance | |
| test.py:18:10:18:35 | ControlFlowNode for Attribute() | test.py:19:14:19:39 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:18:26:18:34 | ControlFlowNode for file_path | test.py:18:10:18:35 | ControlFlowNode for Attribute() | provenance | MaD:64 |
| test.py:18:26:18:34 | ControlFlowNode for file_path | test.py:19:14:19:39 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:18:26:18:34 | ControlFlowNode for file_path | test.py:22:21:22:29 | ControlFlowNode for file_path | provenance | |
| test.py:22:5:22:30 | ControlFlowNode for Attribute() | test.py:22:5:22:60 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:22:21:22:29 | ControlFlowNode for file_path | test.py:22:5:22:30 | ControlFlowNode for Attribute() | provenance | MaD:64 |
| test.py:22:21:22:29 | ControlFlowNode for file_path | test.py:22:5:22:60 | ControlFlowNode for Attribute() | provenance | Config |
| test.py:22:21:22:29 | ControlFlowNode for file_path | test.py:24:18:24:26 | ControlFlowNode for file_path | provenance | |
| test.py:24:18:24:26 | ControlFlowNode for file_path | test.py:24:5:24:52 | ControlFlowNode for Attribute() | provenance | Config |
Expand Down Expand Up @@ -37,14 +47,19 @@ edges
| test.py:28:26:28:34 | ControlFlowNode for file_path | test.py:64:36:64:44 | ControlFlowNode for file_path | provenance | |
nodes
| test.py:10:16:10:24 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:11:5:11:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:11:5:11:52 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:11:21:11:29 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:12:5:12:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:12:5:12:48 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:12:21:12:29 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:14:10:14:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:14:26:14:34 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:15:14:15:29 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:18:10:18:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:18:26:18:34 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:19:14:19:39 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:22:5:22:30 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:22:5:22:60 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:22:21:22:29 | ControlFlowNode for file_path | semmle.label | ControlFlowNode for file_path |
| test.py:24:5:24:52 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
Expand Down
Loading
Loading