Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_find_and_load_licence() in pre-compile might be unsafe #50

Open
mgorny opened this issue Jun 25, 2022 · 6 comments
Open

_find_and_load_licence() in pre-compile might be unsafe #50

mgorny opened this issue Jun 25, 2022 · 6 comments

Comments

@mgorny
Copy link

mgorny commented Jun 25, 2022

Basically, what I'm thinking of is:

>>> [f for f in importlib.metadata.files("validate-pyproject") if f.stem == 'LICENSE']
[PackagePath('validate_pyproject-0.9.post1.dev3+g3b0db8c.dist-info/LICENSE.txt'),
 PackagePath('validate_pyproject/_vendor/fastjsonschema/LICENSE')]

i.e. both the package license file and the vendored fastjsonschema's LICENSE file matches this:

def _find_and_load_licence(files: Optional[Sequence[_M.PackagePath]]) -> str:
    if files is None:  # pragma: no cover
        raise ImportError("Could not find LICENSE for package")
    try:
        return next(f for f in files if f.stem.upper() == "LICENSE").read_text("UTF-8")

and I think it might be UB that the license file from dist-info is returned first.

That said, in Gentoo we remove the LICENSE* files from .dist-info since they are redundant to our license metadata, so this doesn't work correctly at all. Could you perhaps install the license file directly as part of package data, and use it similarly to how FJS's license is grabbed?

@abravalheri
Copy link
Owner

and I think it might be UB that the license file from dist-info is returned first.

I think when I was developing this, I noted the results in importlib.metadata.files are always given in "file system topologic order", so the license would come first... I may change this to make it more robust.

That said, in Gentoo we remove the LICENSE* files from .dist-info since they are redundant to our license metadata, so this doesn't work correctly at all.

Could we defer this decision until after PEP 639 is approved or rejected 😅? What would be the plan of Gentoo to handle that standard?

PEP 639 (if approved) will explicitly make it reliable for packages to query for its own license files. So far the text of the PEP reads like the following (emphasis are mine):

If a License-File is listed in a source or built distribution’s core metadata, that file MUST be included in the distribution at the specified path relative to the root license directory, and MUST be installed with the distribution at that same relative path.

The specified relative path MUST be consistent between project source trees, source distributions (sdists), built distributions (wheels) and installed projects. Therefore, inside the root license directory, packaging tools MUST reproduce the directory structure under which the source license files are located relative to the project root.

Root License Directory (Short: License Directory)
The directory under which license files are stored in a project/distribution and the root directory that their paths, as recorded under the License-File core metadata fields, are relative to. Defined here to be the project root directory for source trees and source distributions, and a subdirectory named license_files of the directory containing the core metadata (i.e., the .dist-info/license_files directory) for built distributions and installed projects.

@mgorny
Copy link
Author

mgorny commented Jul 15, 2022

Sounds like another random "change everything every half of year" standard that's pushed with zero consideration for world outside virtualenvs and that doesn't care about anyone who doesn't have time to read all the new PEPs proposed every month.

@abravalheri
Copy link
Owner

abravalheri commented Jul 15, 2022

Sounds like another random "change everything every half of year" standard that's pushed with zero consideration for world outside virtualenvs and that doesn't care about anyone who doesn't have time to read all the new PEPs proposed every month.

Luckily this one has been proposed and is under discussion for over 2 years now, so the community had some time to engage.
Please feel free to add your thoughts to https://discuss.python.org/t/pep-639-round-2-improving-license-clarity-with-better-package-metadata/12622/76.

@mgorny
Copy link
Author

mgorny commented Jul 15, 2022

If only I had hours of free time to read the existing thread… unless someone summarized it?

@stanislavlevin
Copy link

In ALTLinux we deduplicate common LICENSE files.
Recent debundle of fastjsonschema in 371ec7d broke us:

[builder@localhost test]$ python3 -m validate_pyproject.pre_compile -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[ERROR] ImportError: Could not find LICENSE for package

[DEBUG] Please check the following information:
Traceback (most recent call last):
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/cli.py", line 185, in exceptions2exit
    yield
  File "/usr/lib64/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/cli.py", line 91, in run
    pre_compile(prms.output_dir, prms.main_file, cmd, prms.plugins, prms.replacements)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 58, in pre_compile
    write_notice(out, main_file, original_cmd, replacements)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 105, in write_notice
    notice = notice.format(notice=opening, main_file=main_file, **load_licenses())
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 115, in load_licenses
    "fastjsonschema_license": _find_and_load_licence(_M.files("fastjsonschema")),
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 134, in _find_and_load_licence
    raise ImportError("Could not find LICENSE for package")
ImportError: Could not find LICENSE for package

@hroncok
Copy link

hroncok commented Mar 21, 2023

In Fedora RPM packages, we keep the LICENSE files, but we drop RECORDs, which is allowed by PEP 627 and the current PyPA specification as well. The code here breaks because of that.

To reproduce, run:

(venv) [tmp]$ pip install validate_pyproject
...
Successfully installed fastjsonschema-2.16.3 validate_pyproject-0.12.1

(venv) [tmp]$ python -c 'import validate_pyproject.pre_compile; validate_pyproject.pre_compile.load_licenses()'

(venv) [tmp]$ rm venv/lib/python3.*/site-packages/*.dist-info/RECORD 

(venv) [tmp]$ python -c 'import validate_pyproject.pre_compile; validate_pyproject.pre_compile.load_licenses()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../venv/lib64/python3.11/site-packages/validate_pyproject/pre_compile/__init__.py", line 115, in load_licenses
    "fastjsonschema_license": _find_and_load_licence(_M.files("fastjsonschema")),
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../venv/lib64/python3.11/site-packages/validate_pyproject/pre_compile/__init__.py", line 134, in _find_and_load_licence
    raise ImportError("Could not find LICENSE for package")
ImportError: Could not find LICENSE for package

importlib.metadtata.files docs mention this:

In the case where the metadata file listing files (RECORD or SOURCES.txt) is missing, files() will return None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants