Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Modelling of the Standard Library #16840

Open
wants to merge 13 commits into
base: yoff-python-stop-extracting-std-lib
Choose a base branch
from
105 changes: 104 additions & 1 deletion python/ql/lib/semmle/python/frameworks/Stdlib/StdLib.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,107 @@ extensions:
- addsTo:
pack: codeql/python-all
extensible: sinkModel
data: []
data:
- ["subprocess.Popen!","Subclass.Call.Argument[0,args:]", "log-injection"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
- ["zipfile.ZipFile","Member[extractall].Argument[0,path:]", "path-injection"]

- addsTo:
pack: codeql/python-all
extensible: summaryModel
data:
# See
# - https://docs.python.org/3/glossary.html#term-mapping
# - https://docs.python.org/3/library/stdtypes.html#dict.get
- ["_collections_abc.Mapping", "Member[get]", "Argument[1,default:]", "ReturnValue", "taint"]
yoff marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser
- ["argparse.ArgumentParser", "Member[_parse_known_args,_read_args_from_files]", "Argument[0,arg_strings:]", "ReturnValue", "taint"]
- ["argparse.ArgumentParser", "Member[parse_args,parse_known_args]", "Argument[0,args:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/cgi.html#higher-level-interface
- ["cgi.FieldStorage", "Member[getfirst,getlist,getvalue]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack
- ["contextlib.ExitStack", "Member[enter_context]", "Argument[0,cm:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/copy.html#copy.deepcopy
- ["copy", "Member[copy,deepcopy]", "Argument[0,x:]", "ReturnValue", "value"]
# See
# - https://docs.python.org/3/library/ctypes.html#ctypes.create_string_buffer
# - https://docs.python.org/3/library/ctypes.html#ctypes.create_unicode_buffer
- ["ctypes", "Member[create_string_buffer,create_unicode_buffer]", "Argument[0,init:,init_or_size:]", "ReturnValue", "taint"]
# See https://docs.python.org/3.11/distutils/apiref.html#distutils.util.change_root
- ["distutils", "Member[util].Member[change_root]", "Argument[0,new_root:,1,pathname:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/email.header.html#email.header.Header
- ["email.header.Header!", "Subclass.Call", "Argument[0,s:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr
- ["email", "Member[utils].Member[parseaddr]", "Argument[0,addr:]", "ReturnValue", "taint"]
- ["email", "Member[utils].Member[parseaddr]", "Argument[0,addr:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See See https://docs.python.org/3/library/fnmatch.html#fnmatch.filter
- ["fnmatch", "Member[filter]", "Argument[0,names:].ListElement", "ReturnValue.ListElement", "value"]
- ["fnmatch", "Member[filter]", "Argument[0,names:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/getopt.html#getopt.getopt
- ["getopt", "Member[getopt]", "Argument[0,args:]", "ReturnValue.TupleElement[1]", "taint"]
- ["getopt", "Member[getopt]", "Argument[1,shortopts:,2,longopts:]", "ReturnValue.TupleElement[0].ListElement.TupleElement[0]", "taint"]
# See https://docs.python.org/3/library/gettext.html#gettext.gettext
- ["gettext", "Member[gettext]", "Argument[0,message:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/gzip.html#gzip.GzipFile
- ["gzip.GzipFile!", "Subclass.Call", "Argument[0,filename:]", "ReturnValue", "taint"]
# See
# - https://docs.python.org/3/library/html.html#html.escape
# - https://docs.python.org/3/library/html.html#html.unescape
- ["html", "Member[escape,unescape]", "Argument[0,s:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/html.parser.html#html.parser.HTMLParser.feed
- ["html.parser.HTMLParser", "Member[feed]", "Argument[0,data:]", "Argument[self]", "taint"]
# See https://docs.python.org/3.11/library/imp.html#imp.find_module
- ["imp", "Member[find_module]", "Argument[0,name:,1,path:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/logging.html#logging.getLevelName
# specifically the no matching case
- ["logging", "Member[getLevelName]", "Argument[0,level:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/logging.html#logging.LogRecord.getMessage
- ["logging.LogRecord", "Member[getMessage]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type
- ["mimetypes", "Member[guess_type]", "Argument[0,url:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/multiprocessing.html#multiprocessing.connection.Listener
- ["multiprocessing.connection.Listener!", "Subclass.Call", "Argument[3,authkey:]", "ReturnValue", "taint"]
Comment on lines +66 to +67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right to me. please explain :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If authkey is given and not None, it should be a byte string and will be used as the secret key for an HMAC-based authentication challenge." So the authkey is stored on the constructed object somehow.

Copy link
Contributor Author

@yoff yoff Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will have been a path where it has reached a sink..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about this one, and it sounds more like a step that would induce FPs 🤔

image

I'm probably at 5/10 on removing it, but can also disagree and commit if you really want it 👍

# See https://github.com/python/cpython/blob/main/Lib/nturl2path.py
# No user-facing documentation, unfortunately.
- ["nturl2path", "Member[pathname2url]", "Argument[0,p:]", "ReturnValue", "taint"]
- ["nturl2path", "Member[url2pathname]", "Argument[0,url:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/optparse.html#optparse.OptionParser.parse_args
- ["optparse.OptionParser", "Member[parse_args]", "Argument[0,args:,1,values:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See https://github.com/python/cpython/blob/3.10/Lib/pathlib.py#L972-L973
- ["pathlib.Path", ".Member[__enter__]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/os.html#os.PathLike.__fspath__
- ["pathlib.PurePath", "Member[__fspath__]", "Argument[self]", "ReturnValue", "taint"]
# See
# - https://docs.python.org/3/library/asyncio-queue.html#asyncio.Queue.put
# - https://docs.python.org/3/library/asyncio-queue.html#asyncio.Queue.put_nowait
- ["queue.Queue", "Member[put,put_nowait]", "Argument[0,item:]", "Argument[self]", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See
# - https://docs.python.org/3/library/random.html#random.choice
# - https://docs.python.org/3/library/random.html#module-random
- ["random", "Member[choice]", "Argument[0,seq:]", "ReturnValue", "taint"]
- ["random.Random", "Member[choice]", "Argument[0,seq:]", "ReturnValue", "taint"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these read list elements from seq?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a value step from ListElement (and perhaps SetElement). We still need these taint ones, though, since we may not have knowledge about seq other than it being tainted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like you to add the read from ListElement/SetElement 👍

# See https://docs.python.org/3/library/shlex.html#shlex.quote
- ["shlex", "Member[quote]", "Argument[0,s:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/shutil.html#shutil.rmtree
- ["shutil", "Member[rmtree]", "Argument[0,path:]", "Argument[2,onerror:,onexc:].Argument[1]", "taint"]
yoff marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/shutil.html#shutil.which
- ["shutil", "Member[which]", "Argument[0,cmd:,2,path:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/subprocess.html#subprocess.Popen
- ["subprocess.Popen!", "Subclass.Call", "Argument[0,args:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See
# - https://docs.python.org/3/library/tarfile.html#tarfile.open
# - https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.open
- ["tarfile", "Member[open]", "Argument[0,name:,2,fileobj:]", "ReturnValue", "taint"]
- ["tarfile.TarFile", "Member[open]", "Argument[0,name:,2,fileobj:]", "ReturnValue", "taint"]
RasmusWL marked this conversation as resolved.
Show resolved Hide resolved
# See https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp
- ["tempfile", "Member[mkdtemp]", "Argument[0,suffix:,1,prefix:,2,dir:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/tempfile.html#tempfile.mkstemp
- ["tempfile", "Member[mkstemp]", "Argument[0,suffix:,1,prefix:,2,dir:]", "ReturnValue.TupleElement[0,1]", "taint"]
# See https://docs.python.org/3/library/textwrap.html#textwrap.dedent
- ["textwrap", "Member[dedent]", "Argument[0,text:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/traceback.html#traceback.StackSummary.from_list
- ["traceback.StackSummary", "Member[from_list]", "Argument[0,a_list:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/typing.html#typing.cast
- ["typing", "Member[cast]", "Argument[1,val:]", "ReturnValue", "value"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
- ["urllib", "Member[parse].Member[quote]", "Argument[0,string:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus
Expand All @@ -35,6 +123,21 @@ extensions:
- ["urllib", "Member[parse].Member[urlencode]", "Argument[0,query:]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin
- ["urllib", "Member[parse].Member[urljoin]", "Argument[0,base:,1,url:]", "ReturnValue", "taint"]
# See the internal documentation
# https://github.com/python/cpython/blob/3.12/Lib/zipfile/_path/__init__.py#L103-L105
- ["zipfile.CompleteDirs", "Member[namelist]", "Argument[self]", "ReturnValue", "taint"]
# See https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile
# it may be necessary to read the code to understand the taint propagation
# Constructor: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1266
- ["zipfile.ZipFile!", "Subclass.Call", "Argument[0,file:]", "ReturnValue", "taint"]
- ["zipfile.ZipFile!", "Subclass.Call", "Argument[0,file:]", "ReturnValue.Attribute[filelist].ListElement.Attribute[filename]", "value"]
# _extract_member: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1761
- ["zipfile.ZipFile", "Member[_extract_member]", "Argument[1,targetpath:]", "ReturnValue", "taint"]
# infolist: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1498-L1501
- ["zipfile.ZipFile", "Member[infolist]", "Argument[self]", "ReturnValue", "taint"]
- ["zipfile.ZipFile", "Member[infolist]", "Argument[self].Attribute[filelist]", "ReturnValue", "value"]
# namelist: https://github.com/python/cpython/blob/3.12/Lib/zipfile/__init__.py#L1494-L1496
- ["zipfile.ZipFile", "Member[namelist]", "Argument[self]", "ReturnValue", "taint"]
- addsTo:
pack: codeql/python-all
extensible: neutralModel
Expand Down