Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(purl): first purl iteration #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ffontaine
Copy link
Contributor

Guessing purl thanks to cpe2purl database, upstream purl (e.g. github, gitlab, sourceforge) are preferred over distribution specific purl (e.g. debian, ubuntu, fedora, etc.)

In this first iteration, only json and cyclonedx is handled. Moreover, purl2cpe can't be installed through pypi so purl2cpe.db has to be manually built. I couldn't push it as database is too big (around 350 MB).

Guessing purl thanks to cpe2purl database, upstream purl (e.g. github,
gitlab, sourceforge) are preferred over distribution specific purl (e.g.
debian, ubuntu, fedora, etc.)

In this first iteration, only json and cyclonedx is handled. Moreover,
purl2cpe can't be installed through pypi so purl2cpe.db has to be
manually built. I couldn't push it as database is too big (around
350 MB).

Signed-off-by: Fabrice Fontaine <[email protected]>
@anthonyharrison
Copy link
Owner

@ffontaine I have now had a look at the purl2cpe database. As you have noticed, the purl's don't have any version information included which is very disappointing. Is there any reason why the upstream sources have been limited? Why not include the language ecosystems such as pypi, npm or distro sources such as deb?

This change would require the database to be included as part of the lib4sbom. I suggest it is stored in a separate directory in the same way the license data is maintained and then accessed in a similar way (see license.py). Maybe create a new class PurlGenerator?

@inosmeet
Copy link

Hey @anthonyharrison!

as @terriko mentioned in #3771:

Some things won't have CPE entries and thus won't be in purl2CPE. But we may know (from bug reports) that there's a product with the same name that is absolutely not the same thing. So we'll need to provide a "is not" database to reduce false positives.
I suggest using a similar setup to what purl2cpe does -- allow humans to submit pull requests, make all the data readable, provide a way to load it into a queryable database.

we are gonna need to make a database similar to purl2cpe to incorporate purls/products that doesn't have CPEs, so would it be reasonable to maintain the whole database in-house instead of making a new one? by doing so we could also include version info from cpe into purl which is given in the database:

pkg:github/silverstripe/silverstripe-framework|cpe:2.3:a:silverstripe:framework:4.13.25:*:*:*:*:*:*:*

@anthonyharrison
Copy link
Owner

@Dev-Voldemort There is a lot of activity in the purl/CPE space at the moment. I would be interested in understanding what your database would be doing noting that a 1-1 mapping of PURL to CPE is not possible.

@inosmeet
Copy link

As you said in this comment, some decisions needs to be made.

I was thinking, since we will be generating purl ourselves with more info like version, subpath i.e: #3833
So why not maintain our own database with more informative purls, which might help in more precise mapping (I may be wrong here).
And in cases where something doesn't have CPE, we can add an appropriate entry for that.

All of this is based on the assumption that we will be utilizing purl2cpe via installing their database.

P.S: I'm in the process of making GSOC proposal, and am a little confused how would purl2cpe integration would help without version info.

@anthonyharrison
Copy link
Owner

@Dev-Voldemort I can see various discussions about GSOC and purls. Can I suggest you keep the discussions on the GSOC thread and not in lib4sbom.

Lib4sbom is a SBOM generator/parser library. The data (e.g. PURL) should be coming from the script which is using lib4sbom the calling script should be responsible for ensuring the data is correct. Validating that the purl is correct (other than it is of the correct format) is not the responsibility of the library.

@anthonyharrison anthonyharrison added the wontfix This will not be worked on label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants