Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPi metadata doesn't have expected values - exception thrown #25

Open
JR-Carroll opened this issue Oct 9, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@JR-Carroll
Copy link

When running sbom4python against my project, I am getting an exception thrown. It looks like the PyPi package in question doesn't have repo_metadata filled out (NoneType). I debated opening a ticket in lib4sbom or lib4package (as it traces through both of them), but I landed here because the exception is unhandled in sbom4python; note - please consider if the other libs require additional hardening (aka applying "fixes" in the other libs may be worthwhile for other consumers that use lib4sbom and lib4package

GOOD PyPi Information: https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages/tomli

  • This has the required fields, is parsed correctly and passes fine.

OFFENDING PyPi Package: https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages/tqdm

  • This DOES NOT have the report_metadata

Expected Behavior:

When the missing data cannot be found it should not result in an unhandled exception. That said, the license information IS available for tqdm, it's just not where it's expected. So I suggest trying to grep it out of the other location it's found in the JSON payload.

Observed Behavior:

SBOM fails to execute when running sbom4python -r requirements.txt. With -d on, I can see it successfully get through a lot of packages, but it halts and exists on tqdm package due to the missing data in the repo_metadata.

image

Stacktrace/Breakpoint Using pdb

image
image

Thoughts on the Fix

I am happy to submit a PR for this, but I can see many ways of fixing this and I think it's best for the repo owner to decide what's best for the architecture (of all libs involved).

My suggestion, take it or leave it, is to allow for extraction of licenses from other fields (understood parsing/grep'ing those may be more cumbersome than desired), else pull the string out exactly as it's in the field and do no parsing/grep'ing/regex'ing. Ultimately, this should be handled as the data coming into the lib4package, lib4sbom and sbom4python is external data and it looks like some packages don't play nice with PyPi or there is a lack of enforcement on the JSON blob (no DTD equivalent).

Ultimately I yield to the wisdom of the maintainer to decide what/where the fix goes. Yes, I agree, that I also thought about going back to the tqdm and asking them to fill in their repo_metadata, but that's seems silly (and inefficient) to go to each maintainer and ask them to fill out information for sbom4python.

@anthonyharrison anthonyharrison added the bug Something isn't working label Oct 15, 2024
@anthonyharrison
Copy link
Owner

Thanks @JR-Carroll. This highlights ne of the big challenges with the metadata associated with Python (and other) ecosystems - inconsistency. Added to my backlog. I will also raise an issue with the ecosyste.ms maintainer as the API should be more resilient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants