Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spec for go package URLs #338

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maceonthompson
Copy link

The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.

Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.

In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.

The go type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

Copy link
Contributor

@matt-phylum matt-phylum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #196 #294 #308

This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.

PURL-TYPES.rst Outdated
------
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the field empty or does it imply the go mod proxy? It can't be both.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

PURL-TYPES.rst Outdated
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional information should be explicitly defined here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactlty. be specific in the spec, so we all are on the same page.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL in the new commit (sorry for that).

PURL-TYPES.rst Outdated
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably specify that it is case sensitive. pkg:golang incorrectly states that it is not case sensitive and must be lowercased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

@maceonthompson
Copy link
Author

See also #196 #294 #308

Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of qualifiers for build info). They go into more detail than this proposal, but especially in the case of namespaces #63 (comment) is a good example as to why dropping name in favor of an entirely coded namespace would be more useful. I understand that having a bunch of %2F in the PURL is ugly for humans, but is (we feel) necessary to ensure that go PURLs are consistent (which is to say that go module -> PURL is injective, a go module cannot be represented by different PURLs).

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

@matt-phylum
Copy link
Contributor

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

I think the better solution to this problem is that pkg:golang/host.com/maybeuser%2Fmodule stays illegal. It'd be better if the documentation explicitly stated it were illegal, but based on the examples and test cases the correct form is pkg:golang/host.com/maybeuser/module, and based on the reference parsing and formatting algorithms it's clear that these PURLs are distinct.

However, "a go module cannot be represented by different PURLs" is not generally the case:

  • The PURL spec describes a canonical format for PURLs, but users and even commonly used PURL implementations often get this wrong and produce non-canonical PURLs which must still be considered equal. For example, pkg:golang/host%2Ecom/maybeuser/module is a non-canonical, valid, PURL which refers to the same package.
  • A PURL may have qualifiers which may or may not be critical to the PURL. A PURL with a ?goarch is a different PURL which refers to the same module, but a PURL with a ?repository_url (or however the module proxy is specified) is a different PURL which may refer to a different module (probably more likely in other ecosystems).

@jkowalleck jkowalleck added Proposed new type type: golang Proposed new type as well as component discussions labels Nov 8, 2024
@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

This is a breaking change that affects all software utilizing PURL for Go.

I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced.

@matt-phylum
Copy link
Contributor

It is breaking because no existing PURL software expects pkg:go, and new PURL software will not expect pkg:golang. This creates a compatibility problem where either the PURL is rejected as an unrecognized type or software on different sides of the breakage don't understand each other. If this is merged, all software that works with Go PURLs will need to be updated to accept both types of Go PURL and convert before they interoperate again.

@jkowalleck
Copy link
Member

It is breaking because no existing PURL software expects pkg:go [...]

this is true to every newly proposed PURL Type :-)
And none of them is a breaking change - neither in spec nor in behaviour.

this PR is trying to add a new type go. the existing golang is not touched at all.

@matt-phylum
Copy link
Contributor

The problem is that this is not a new type. The go type is intended to replace golang.

@jkowalleck
Copy link
Member

The problem is that this is not a new type.

it is not? Could you point me to the existing go type?

The go type is intended to replace golang.

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

@matt-phylum
Copy link
Contributor

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

@jkowalleck
Copy link
Member

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

exactly this paragraph makes it clear: this is a non-breaking change.

Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

I'm sure it can be fixed without making this level of breaking change.

i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them

The only way to fix golang is

  • a) introduce breaking changes in the existing purl-type << undesired !!!
  • b) introduce a new purl-type << feasible
  • c)
    1. have the PURL spec modified to allow versioning of purl-types << burocratic efforts that might lead to nothing
    2. if c)1. was successful: craft a purl-type golang version 2
    3. else fall back to a) or b)

@matt-phylum
Copy link
Contributor

Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

If you start writing SBOMs that have go, they will be processed incorrectly by software that doesn't support go. If you continue writing SBOMs that have golang, they will be processed incorrectly by software that doesn't support golang. If you combine SBOMs using software that doesn't understand that go and golang are really the same type, the dependencies will be duplicated in the output. If you query go or golang packages against a vulnerability database, you have a 50/50 chance of finding the vulnerabilities unless the database understands both and converts golang to go.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

how?

If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
This is out of the scope of the purl spec -- we do not have authority there.

Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

So?
This is true to every purl type that is added over time.
An implementation written 2 years ago might not know the purl type that was defined yesterday.
This is by design and was never an issue. This is out of the scope of the purl spec -- we do not have authority there.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

A PR tells a story, and the effective patch gets updated along with the discussions on a PR.
the initial PR description is usually not updated in accordance with the effective patch.

(PS: I review the content of the PR. and at the time of review, I saw no breaking change.
I was starting the "breaking" discussion in expectation that you'd agree that is no longer a breaking change, based on the current state of the PR.
I am happy we are discussing the topic anyway, i might be wrong, and I still need to learn.)

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

how comes?

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

the curerent golang spec says: the path MUST be lowercased.
This is wrong in terms of actual go dependency management: the path MUST NOT be lowercased.
Changing MUST to MUST NOT in golang purl-type is a breaking change of the specification.

PURL-TYPES.rst Outdated
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Member

@jkowalleck jkowalleck Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``name`` will be the full module path.
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_.

this change would close #308

Copy link
Contributor

@matt-phylum matt-phylum Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- - The ``name`` will be the full module path.
+ - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.

this change would close #308

I don't think this is correct.

  1. I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
  2. The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will modify my change-suggestion accordingly. does it fit better, now?

PURL-TYPES.rst Outdated

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``subpath`` will represent the package path within a module.
- The ``subpath`` is the unmodified package path within a module.

PURL-TYPES.rst Outdated
- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``version`` will be a valid go version or pseudoversion, or empty.
- The ``version`` may be a valid go version or pseudoversion, omitted when empty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why may here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because version is optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

@matt-phylum
Copy link
Contributor

Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that golang is now spelled go is a breaking change. You can argue that this isn't a breaking change in the PURL spec itself because it doesn't change golang, but it necessitates a breaking change in every current implementation of Go PURLs and complicates implementations of Go PURL consuming software as long as there are both go and golang PURLs going around.

Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:

  • An outdated PURL producer that incorrectly lowercases an ID containing capitals produces the wrong PURL, but today those producers are producing exactly the same PURL and calling it correct despite referring to the wrong package.
  • An outdated PURL consumer that incorrectly lowercases an ID containing capitals reads the wrong ID, but today those consumers are already reading exactly the same ID and calling it correct despite referring to the wrong package.

In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation.

This causes the same alignment problems as introducing a go type, except that if the correct ID is lowercase, no problem occurs because lowercasing is already producing the correct PURL.

¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical.

@matt-phylum
Copy link
Contributor

Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:

@zpavlinovic
Copy link

zpavlinovic commented Nov 14, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does.

There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec.

@pombredanne
Copy link
Member

@maceonthompson Thanks for putting this together! this makes a lot sense, and we have an issue with Go alright. Let me look at the comments in details and come back with my 2 cents!

@pombredanne
Copy link
Member

@matt-phylum re:

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

I am not sure that's hte case, but a new type vs. updating the existing type demands some careful thinking :)

PURL-TYPES.rst Outdated

pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%[email protected]#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a likely problem with the use of subpath: there is no way to determine where the module ends and the package starts in the general case, is there?
For instance, in the path google.golang.org/genproto/googleapis/api/annotations how can I determine safely that google.golang.org/genproto is a module and that googleapis/api/annotations is a package inside this module? I need either a go proxy lookup or a full filesystem to locate a go.mod/go.sum file, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way, if the module's code is available to you, to determine from a package import where module path ends and the package path begins by making HTTP requests.

I think the use of the subpath here is good because it puts the burden of determining this on whatever generates the PURL, which is likely aware of Go and either has the module paths or is most likely to be able to find the module path from the full package path. Then if you want to use a tool that checks PURLs against a database of information about modules (eg vulnerabilities), the tool already has all the information it needs. Otherwise, either the tool would need to make external API calls to figure out the module path of the PURL or the database would need to have an entry for every package in the module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add to @matt-phylum's comment. If a tool is producing a PURL for a Go artifact, then it can use go version, Debug.BuildInfo, or packages.Load to get information about the package and its corresponding module. The encoding proposed here then makes it clear what the modules and packages are.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is true for the general use case of PURLs. E.g. we do static analysis of binaries and while we can get information about linked packages, there's no indication of which part of the paths correspond to modules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you are looking at a Go symbol from the symbol table, you can get its package. You can get the module correctly by prefix-matching it with module information from debug.Buildinfo of the binary, unless there are several modules that are prefixes of the package. My inclination is that it should not affect what is proposed here. (Arguably, there should be a way to get module information for a symbol in the binary, just the way one can do it for the source analysis.)

@pombredanne
Copy link
Member

BTW, an elephant in the room is whether the distinction between a namespace and name makes sense not only here, but also in the whole spec, globally.

I found myself using a variable with a "namespace/name" substring more often than not.
Then, how to split this in optional namespace and name could become a type-specific distinction, but the general concept would be that of "namespace/name", which could look like:

With this the whole google.golang.org/genproto/googleapis/api/annotations would be the namespace/name and would not have a specific split in Go, all would be in the name?
(and the same could apply where relevant to other package types)

It could have a minimal impact on the spec.

PURL-TYPES.rst Outdated

pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%[email protected]#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to include all the buildinfo structure as qualifiers?
If so, this would only apply in a built binary?

Copy link
Member

@jkowalleck jkowalleck Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
If so, all the qualifiers MUST be documented in the type-spec.

currently it reads:

Additional Build information for binaries can be included as qualifiers (i.e VCS info, go version info, GoArch/GoOS info etc)

I am afraid this documentation is insufficient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will expand on this.

PURL-TYPES.rst Outdated
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%[email protected]#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
pkg:go/golang.org%2Fx%[email protected]?goversion=1.23.2#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Go module versions always to be prefixed with a v?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version.
-- https://go.dev/ref/mod#versions

version could also be a pseudo-version -- a git-tag, a git-commit-hash, or something like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pseudoversion is a special kind of version that also starts with a v: https://go.dev/doc/modules/version-numbers#pseudo-version-number

I think for Go modules, including when using the Go module system to refer to something that predates modules, the version always starts with a v. In which case, versions that don't start with v would only be used with older tools like Dep?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a version exists, it should be a valid Go module version. It should start with a v.

Note that hashes should not be permitted, they are not a valid Go version (resolution of hash commits in go tooling is a convenience feature).

@pombredanne
Copy link
Member

@matt-phylum you wrote

See also:

This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change.

Thanks for the links! I tend to think along the same lines, and we can likely salvage the golang type.

Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.

I need to pounder this. See my other comment wrt. the namespace/name above in #338 (comment)

@zpavlinovic
Copy link

zpavlinovic commented Nov 18, 2024

However, "a go module cannot be represented by different PURLs" is not generally the case:

  • The PURL spec describes a canonical format for PURLs, but users and even commonly used PURL implementations often get this wrong and produce non-canonical PURLs which must still be considered equal. For example, pkg:golang/host%2Ecom/maybeuser/module is a non-canonical, valid, PURL which refers to the same package.
  • A PURL may have qualifiers which may or may not be critical to the PURL. A PURL with a ?goarch is a different PURL which refers to the same module, but a PURL with a ?repository_url (or however the module proxy is specified) is a different PURL which may refer to a different module (probably more likely in other ecosystems).

It is fine that PURL spec allows for more flexibility, but there should be only one way the Go module and package information is encoded. This simplifies the work for clients. It is easy to drop qualifiers from a PURL. It is annoying to generate multiple module+package encodings to see if the incoming PURL applies to your code.

In general, this proposal tries to make it simple and clear to generate and accurately check against PURLs. It might not be the most user-friendly solution, but tools that render PURLs can easily prettify the output. We believe this is worth the sacrifice.

@rhalar
Copy link

rhalar commented Nov 19, 2024

Could it also be clarified how standard library packages are to be represented?

Go has special handling for these, and the 'module' is never explicitly required when using them. But the module does exist for std and cmd
https://github.com/golang/go/blob/master/src/go.mod#L1
https://github.com/golang/go/blob/master/src/cmd/go.mod#L1

Go uses stdlib when reporting vulnerabilities though
https://vuln.go.dev/ID/GO-2024-3105.json

but the exact module name would make more sense we believe.

@matt-phylum
Copy link
Contributor

stdlib is probably good, if it needs to be specified (maybe specifying the compiler is enough). If it were std or cmd it would be a special case for tools that are not necessarily Go-centric to handle the case that pkg:golang/[email protected] is related to stdlib v1.2.3. Maybe pkg:golang/[email protected]#cmd is best if somebody is going to care that it's using the cmd package. However it's done it should be documented.

OSV is already using pkg:golang/stdlib, but that doesn't mean it's necessarily right. https://osv.dev/vulnerability/GO-2024-3105

@rhalar
Copy link

rhalar commented Nov 19, 2024

I'm not sure I follow how it would make it easier for tools?
The problem I have with pkg:golang/[email protected]#cmd is that cmd and std aren't packages, but modules.

@matt-phylum
Copy link
Contributor

Are they modules? They have go.mod files in their source code, but they aren't included in your go.mod and when you import their packages the compiler recognizes that you're trying to use the standard library and provides its own copy of the code from $GOROOT/src instead of downloading a module: https://github.com/golang/go/blob/8f22369136b264567955fb86cff491c247b45b8b/src/cmd/go/internal/modload/build.go#L42-L46

@zpavlinovic
Copy link

I'm not sure I follow how it would make it easier for tools? The problem I have with pkg:golang/[email protected]#cmd is that cmd and std aren't packages, but modules.

The cmd and std should not appear as modules in any of the actual go tools/artifacts. These modules are used internally to do vendoring of trusted packages developed independently elsewhere.

I think we can safely use stdlib for standard library or go tools (which are again in standard library) artifacts. We will add this to the proposal.

@rhalar
Copy link

rhalar commented Nov 20, 2024

I believe I get the argument, and I will defer to your judgement as the more knowledgeable about Go concepts and internals.

However, I do still feel that it would make sense for PURLs to use the "modules". PURLs are primarily unique identifiers so does it really matter what Go tooling does and how the internals work?
stdlib comes a bit out of nowhere and it doesn't seem that different from just using std, which seems more consistent at least, even if it technically doesn't follow the same rules. It still identifies the artifact in a unique way. Omitting any of the two is also fine, which would result in just, e.g.

pkg:golang/crypto/ecdsa@...

But I'm guessing that would cause issues, and it also doesn't follow the current convention for PURLs.

As I said, I'll defer to better judgements, just giving a mostly uneducated opinion. :)

@zpavlinovic
Copy link

Hi folks. I am also working on this proposal and will try to drive the discussion here going forward. (Sorry for the extra commit added.)

To keep the conversation going, I wanted to summarize the discussion so far: 1) what this proposal brings to the table and 2) major concerns as well as how those concerns might be addressed. (If it helps, we can add and cross things in this post to keep track of the progress.)

Proposal main points

Adding a new PURL type go where we:

  • Use only namespace for module information. It is hard and sometimes impossible to figure out what the name should be. As of writing this, there are >57k Go modules that do not start with github.com. In general, there seem to be plenty of modules for which it is unclear what the name is. Special-casing on these will not scale, especially since we can also expect new modules in the future. Some of these examples are particularly tricky, one such being gotest.tools.

  • We will use stdlib as the module for artifacts in the go command or standard library. Consistency with other Go tooling breaks the tie with the std option.

  • Use only versions produced by the Go tooling (or no versions at all). This includes pseudo versions. Hashes should not be allowed. The go command will soon stamp effectively all binaries with a pseudo-version, even in the presence of pending changes and no tags. Though, we will also support (devel) for legacy reasons.

Concerns

  • This can be accomplished by updating documentation of golang type. It is not clear how this can actually be achieved. At the very least, it seems that addressing #308 this way would be a breaking change.

  • This introduces a breaking change. There are differences between golang and go PURL types such that replacing former with the latter can introduce breaking changes. Let us put aside that this is technically not a breaking change. The golang type predates Go modules. Just as Go as the language and as an ecosystem evolved, so should its PURL spec. There is sufficient evidence (#63, #196, #294, #308) that golang PURL type creates problems and there should be a type that addresses its shortcomings. If name is the problem, the newly proposed type might also be called gomod?

  • Namespace are percent-encoded and that will make things hard to read for users. We feel this is a small price to pay. Further, tools rendering go PURLs can pretty-print the namespace to users.

@jkowalleck @matt-phylum @pombredanne

@matt-phylum
Copy link
Contributor

Let us put aside that this is technically not a breaking change.

I work on software that deals with Go PURLs and I can assure you that this is a breaking change. If users start submitting pkg:go instead of pkg:golang, my software needs to be updated to accept pkg:go to continue working. If my software starts returning pkg:go instead of pkg:golang, all the other code receiving output from my software needs to be updated to accept pkg:go to continue working. Otherwise, under no circumstances are Go packages recognized. This is only acceptable if the PURL is to be used as a meaningless unique string value, and even in that case you may run into problems where different tools are producing different unique string keys for the same package until all software is updated to use pkg:go, even if pkg:golang is for some reason left in the PURL spec after replacement.

it seems that addressing #308 this way would be a breaking change.

If #308 is fixed for pkg:golang without making a new pkg:go and users start submitting pkg:golang with uppercase characters or if I start returning pkg:golang with uppercase characters, nothing breaks because the spec says that the values must be lowercased, and some parsers do and some parsers don't, but no parsers verify that the input was already lowercase to begin with. However, until all software is updated, sometimes packages with uppercase characters (uncommon) will not be recognized in cases where the module path is incorrectly lowercased by some tools and not others, and the same problem as before where different tools may produce different PURLs and therefore different unique keys still applies.

Less than half of all public PURL implementations I'm aware of consistently lowercase as specified, and half of all public PURL implementations have already fixed #308 by allowing uppercase characters, and it hasn't been the breaking change it's being made out to be when talking about starting over with a new pkg:go.

  • anchore/packageurl-go, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-php (4/14) convert the namespace and name to lowercase on read (incorrect for Go, even before modules, but matches the current PURL spec).
  • althonos/packageurl.rs, giterlizzi/perl-URI-PackageURL, package-url/packageurl-swift (3/14) confusingly convert the namespace or the name, but not both, to lowercase on read (incorrect).
  • maennchen/purl, package-url/packageurl-dotnet, package-url/packageurl-js, package-url/packageurl-python, package-url/packageurl-ruby, phylum-dev/purl, sonatype/package-url-java (7/14) preserve uppercase characters in the namespace and name on read (correct for Go, but does not match the current PURL spec).

anchore/packageurl-go supports producing pkg:golang PURLs with uppercase characters, but lowercases those characters when reading the same PURL back in. Depending on how you look at it, that's either a bug because it's inconsistent or a feature that it works correctly according to the Go rules at least some of the time.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 13, 2024

Introducing a new improved PURL type for go, is a non-breaking change.
Every existing PR that tries to fix the existing PURL type is a breaking change.


I work on software that deals with Go PURLs and I can assure you that this is a breaking change.

So do I, and I can assure you, for me it is not a breaking-change of the PUTL-type golang spec at all.

If users start submitting pkg:go instead of pkg:golang, my software needs to be updated to accept pkg:go to continue working.

What's the matter? this update would be a non-breaking change, right? it's just a feature.

If my software starts returning pkg:go instead of pkg:golang, all the other code receiving output from my software needs to be updated to accept pkg:go to continue working.

Finally, @matt-phylum ! So now we come to the actual issue: your software would introduce breaking changes. It is not - like you claimed before - that the PURL-type spec introduced any breaking changes.

Well, breaking changes are common in software. If you don't want to deal with them, then your software can simply stay with the old/faulty golang PURL-type, and nobody gets harmed.


#338 on the other hand actually introduces breaking changes. PURL generators/ingestors MUST lowercase golang names in the past, and MUST NOT do so anymore. this is a breaking-change in the spec, not an implementation-suggestion.

anchore/packageurl-go supports producing pkg:golang PURLs with uppercase characters, but lowercases those characters when reading the same PURL back in.

that is exactly the fault behavior described in the current faulty state of the PURL-type spec for golang.

Depending on how you look at it, that's either a bug because it's inconsistent or a feature that it works correctly according to the Go rules at least some of the time.

Nope. that is exactly implied by the current state faulty of the PURL-type spec for golang

@matt-phylum
Copy link
Contributor

Finally, @matt-phylum ! So now we come to the actual issue: your software would introduce breaking changes. It is not - like you claimed before - that the PURL-type spec introduced any breaking changes.

Well, breaking changes are common in software. If you don't want to deal with them, then your software can simply stay with the old/faulty golang PURL-type, and nobody gets harmed.

This simply not true. Even if I do nothing, there is still a compatibility problem if any other software being used in conjunction starts using pkg:go. If adding support for pkg:go is really a breaking change that I am introducing, than nobody can implement pkg:go without introducing the same breaking change, and there is no point in introducing it to the spec. Adopting pkg:go into the spec requires me to make a change to at least accept pkg:go and translate it into pkg:golang.

All software, not just mine, must be updated to accept pkg:go before any software produces pkg:go, and that is just not possible to coordinate. If somebody produces an SBOM that says pkg:go/github.com%2Fgolang-fips%[email protected] and they scan that SBOM with any tools that currently exist today, they will not know that they have an unpatched security vulnerability because the security vulnerability is known only to affect pkg:golang/github.com/golang-fips/[email protected], and the best outcome you can hope to get from pkg:go/github.com%2Fgolang-fips%2Fopenssl is an error telling you that go is not a supported package type.

It's likely that even some new software will forever be implementing support for reading both pkg:go and pkg:golang because of use cases like scanning SBOMs of previously published software where the pkg:golang PURL was produced in the past. Given the pace of updates to PURL implementation libraries and software and the uptake of new software versions into enterprises, people will need to at least have an option to output pkg:golang for compatibility reasons for years, and continuing to do so is going to be the safer option that people are going to stick with unless there is a compelling reason not to.

that is exactly the fault behavior described in the current faulty state of the PURL-type spec for golang.

Nope. that is exactly implied by the current state faulty of the PURL-type spec for golang

I don't think you understand. The behavior of anchore/packageurl-go is unique, so if it's the only one doing it correctly that means all others are doing it incorrectly. If you provide it a module path example.com/PACKAGE it produces pkg:golang/example.com/PACKAGE, which is correct according to Go but incorrect according to Purl. However, if you provide it the PURL pkg:golang/example.com/PACKAGE it produces the module path example.com/package. It performs the name normalization during reading but not writing. The other three implementations that follow the spec as written, and the three more which try to implement the spec as written but do it incorrectly, perform the same name normalization during reading as during writing.

#338 on the other hand actually introduces breaking changes. PURL generators/ingestors MUST lowercase golang names in the past, and MUST NOT do so anymore. this is a breaking-change in the spec, not an implementation-suggestion.

Those generators/ingestors must not have done so in the past either. Given the current state of the spec, PURLs cannot refer to packages that contain uppercase characters. If you ask one of the seven spec compliant or partially spec compliant libraries to produce a PURL for a package with uppercase characters, it produces a PURL for a different package which probably does not exist, but could.

At least pkg:alpm pkg:apk pkg:deb and pkg:npm have exactly the same name capitalization problem where PURL says that the name must be lowercased but lowercasing the name changes the meaning (and this is known to cause problems for at least NPM). pkg:oci is similar, but apparently uppercase is invalid so compliant PURL implementations are converting an invalid reference into a valid one, which may be intentional but doesn't seem like a great thing for the PURL implementation to be doing. pkg:pypi has a similar problem where the normalization algorithm it describes is incorrect. pkg:nuget has the opposite problem where PURL specifies that the name must not be lowercased but NuGet is case insensitive. If every minor name normalization issue is fixed by introducing a new package type, there are going to be a lot of duplicate package types.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 16, 2024

re #338 (comment)

Even if I do nothing, there is still a compatibility problem if any other software being used in conjunction starts using pkg:go. If adding support for pkg:go is really a breaking change that I am introducing, than nobody can implement pkg:go without introducing the same breaking change, and there is no point in introducing it to the spec. Adopting pkg:go into the spec requires me to make a change to at least accept pkg:go and translate it into pkg:golang.

I'd consider the gain of an capability to understand/ingest pkg:go a non-breaking change. It is like learning a new language - it does not harm anybody. Do you agree?
I'd consider using pkg:go, when pkg:golang was used before, a breaking change - as I have no reason to assert all my peers know that new purl-type yet. Do you agree?
One option to prevent a breaking change here is: have a feature flag, that defaults to the "legacy" behavior. This option is a consideration of the implementation - none the PURL spec has to make and none the PURL spec has to point out, it is not in the power of PURL spec.

If adding support for pkg:go [...]

What does support even mean? Ingesting it -- gaining this capability is not a breaking change. Producing it -- it depends, see above. Usual software lifecycle, not a problem.

Adopting pkg:go into the spec requires me to make a change to at least accept pkg:go and translate it into pkg:golang.

Adding new capabilities - this is a non-breaking change. No need to translate, but at least understand it.

But all of this is to be considered software lifecycle downstream, not related to the PURL type spec here. adding a new PURL type to the spec is not a breaking change.


to all your other points:

The PURL type spec for golang is faulty - it does not respect today's existing art in form of go's package resolution.
There might be implementations working according to PURL type spec for golang, but against the actual go-ecosystem rules.
There might be implementations working not according to PURL type spec for golang, but are aligned with the actual go-ecosystem rules.

It is a fact, that changes to existing PURL type spec for golang would introduce breaking changes. Not on some imaginary implementation downstream, but on the actual PURL type spec - like changing a "MUST" to a "MUST NOT" is a breaking change.

Therefore, we intend to create a new PURL type. Creating new purl types never has been a breaking change, and is still no such breaking change, when if the scope is the same as an existing one. Period.


Regarding your point with other non-related PURL types: Please adhere to the scope of this PR. We are talking about go, with actual problem, not about the issues others might have.

@matt-phylum
Copy link
Contributor

I'd consider the gain of an capability to understand/ingest pkg:go a non-breaking change. It is like learning a new language - it does not harm anybody. Do you agree?
I'd consider using pkg:go, when pkg:golang was used before, a breaking change - as I have no reason to assert all my peers know that new purl-type yet. Do you agree?

Yes. If you only add to software that reads PURLs support for reading pkg:go, that is not a breaking change.

However pkg:go is being added so that it can be written, and it's not just if you change pkg:golang to pkg:go in existing software that creates a breaking change. The problem can occur if any software being used together starts using pkg:go, even it's new software which has never supported pkg:golang that the user is introducing into their workflow. It's okay to say this new software just isn't compatible with the existing software and it's the user's problem, but this compromises on the promise of PURL being a "mostly universal" URL for referring to packages.

One option to prevent a breaking change here is: have a feature flag, that defaults to the "legacy" behavior.

Managing it through a feature flag could work, assuming all the implementations producing pkg:go PURLs had a feature flag for selecting pkg:golang instead and either they all defaulted to pkg:golang or somehow all end users understood this issue and knew how to set the flags appropriately (maybe for SBOMs there will need to be a translation tool to handle artifacts from the past). But then, what is the incentive for someone to ever change that flag to pkg:go?

This option is a consideration of the implementation - none the PURL spec has to make and none the PURL spec has to point out, it is not in the power of PURL spec.

This seems irresponsible. Somebody reading the PURL spec without having read through all the issues is going be confused as to why there are both pkg:golang and pkg:go, and then if they know Go they will probably reach the conclusion that they want to use pkg:go because it is for modules and pkg:golang must be only for old, pre-module Go. There is no indication that it may be required to use pkg:golang for compatibility with existing software.

Implementations of the new spec using pkg:go are not compatible with implementations of the old spec expecting pkg:golang.

Adding new capabilities - this is a non-breaking change. No need to translate, but at least understand it.

Adding a new capability is not a breaking change, but having to add a "new capability" to understand pkg:go to continue having the same Go capability as previously indicates that there has been a breaking change somewhere.

Translating pkg:go to pkg:golang is the easiest way to support both. All existing software expects pkg:golang and all databases file Go module data under pkg:golang. If it's not translated into a different form, all the code that deals with pkg:golang must also deal with pkg:go, which AFAIK would be a unique problem for Go. Unfortunately, if it is translated into a consistent form, then that likely means the software consuming the PURL is also producing new PURLs in its output, in which case either the software needs to remember whether the input was pkg:golang/pkg:go or the software needs to have a flag to control pkg:golang/pkg:go output even if the software isn't logically producing PURLs. It's possible you will ask for findings providing an SBOM that uses pkg:go PURLs and get back pkg:golang PURLs in the result.

It is a fact, that changes to existing PURL type spec for golang would introduce breaking changes. Not on some imaginary implementation downstream, but on the actual PURL type spec - like changing a "MUST" to a "MUST NOT" is a breaking change.

I still disagree with this. Yes, changing a MUST¹ to a MUST NOT requires a change to existing implementations of pkg:golang instead of requiring new implementations of pkg:go. However, implementations of the new spec which allow uppercase characters are still forwards and backwards compatible with implementations of the old spec, and in combination they function as if both implementations are using the old spec. In most cases, you can't even tell whether an implementation is using the old spec or the new spec because most module paths are lowercase to begin with and the behavior in that case is unchanged. It's even possible to pass the current version of the PURL test suite using an implementation that does not lowercase because none of the examples contain uppercase characters. Half of the existing implementations tested already use the new spec that is being rejected on the grounds of being a breaking change.

¹ The current spec says "must be lowercased" with "must" in lowercase, with no explanation of when or why it should be lowercased. Failure to do so when outputting a PURL makes the PURL non-canonical, not invalid, which doesn't seem like a "MUST" to me. This problem is not unique to pkg:golang, and given the level of inaccuracy in PURL-TYPES.rst it may be a good idea to rethink how these rules are written. For example, if the spec said "the name is case insensitive and must be lowercased for comparison" or "the name is case insensitive and must be lowercased for canonicalization" then the spec would still be wrong, but it would be clear that emitting a PURL with uppercase characters is allowable.

A terrible third option would be to say that the current spec has always been correct and the name really must be lowercased, but the spec author just forgot to mention that the module path is supposed to have already had all the uppercase characters escaped using !: https://pkg.go.dev/golang.org/x/mod/module#hdr-Escaped_Paths. By the same logic that pkg:go is not a breaking change, this could be argued not to be a breaking change because the spec didn't say that this wasn't the case, provided no reason that this couldn't be the case, and even suggested that it might be the case by saying that the name must be lowercased and is implied to be case insensitive, which is the very problem that this escaping scheme is supposed to solve.

Regarding your point with other non-related PURL types: Please adhere to the scope of this PR. We are talking about go, with actual problem, not about the issues others might have.

What non-related PURL types? I only mentioned related PURL types.

If there are at least 6 other types which likewise have critical errors in name normalization rules, deciding that to fix a name normalization rule is a breaking change and requires introduction of a new package type and uncontrolled deprecation and replacement of the existing package type means the same thing will be done at least 6 more times creating pkg:npm2 etc. Doing it for pkg:golang but not the others would be inconsistent.

There is precedence for making this kind of "breaking" change to an existing type in #220 which changed an implied "MUST NOT be lowercased" into an explicit "must be lowercased."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed new type type: golang Proposed new type as well as component discussions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants