-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add an example with /
in namespace to README and SPEC
#176
base: master
Are you sure you want to change the base?
Conversation
I think it's rather clear from the specification which states:
However, the examples (e.g. for golang) seem to be contradicting that:
Looking at the component specification of
it's obvious that the |
There is no contradiction.
The RE the linked vulnerablecode issue: this is an annoying thing about |
Then your definition of "percent-encoded" seems to differ from mime, and what Wikipedia says: It lists So if |
Percent encoding lists no reserved characters.
|
Ok, let me see:
Right, and a PURL actually is a URI.
The term "path" refers to the URI equivalent of a PURL. An URI's path component corresponds to a PURL's namespace and name part as a whole. |
You're apparently looking at a different Wikipedia. Mine does not list any reserved characters for percent encoding in general, and also clarifies that "reserved" does not mean escaped:
The real RFC also says that slashes are allowed in the path:
If you parse a PURL as a URL, the URL's path is the namespace and name joined with a slash. Encoding that delimiting slash or any slashes contained within the namespace is forbidden by the PURL spec. Only slashes within the name of the package can be encoded, but for Go PURLs the name must not contain slashes because of the way PURL creates a namespace and name for an ecosystem that only has package IDs. Encoding the slashes in a path would also be an error when building a URL. |
Well, it would be easier to tell if you provided any links like I did. Unless there was some irony involved from your side 🙄
That again contradicts what's written in the spec. Let me quote it again:
So name and namespace are encoded in the same way. In the end, what I'm after is that it must be possible to unambiguously map "round-trip" like:
Where package native coordinates is e.g. Maven GAV (group, artifact, version), or NPM scope, name, version. If in the mapping to PURL somehow the information would get lost what's the namespace and what's the name, mapping back to package native coordinates would not be possible and we'd have a problem. |
That is not true. As I said before, "percent-encoded" does not mean that particular characters are or are not encoded. PURL is very particular about which characters are encoded when and how PURLs are formatted because it has a concept of a canonical form. To best understand the encoding requirements you should check the test suite and read the parsing spec. There is no inherent name/namespace ambiguity converting values to PURL and back. For Go, the PURL namespace and name are formed by splitting on the last For Maven, the PURL namespace is the Maven group ID and the PURL name is the Maven artifact ID. Neither of them can contain slashes because of Maven's rules. For NPM, the PURL namespace is the NPM scope and the PURL name is the NPM unscoped name. Neither of them can contain slashes because of NPM's rules. For a theoretical ecosystem that does allow slashes in both the PURL namespace and the PURL name, you would end up with something like namespace |
Ok, but when talking about "percent-encoded" in PURL-context it should always mean the same thing. So Regarding examples, let's focus on the Go case here, as you correctly state that other ecosystems do not allow The character-encoding says
So if
However, I agree that this is not in line with the "how to build" section which says to first split the namespace into segments on All in all, I feel docs, formal spec, tests and actual implementations to not at all be aligned here. |
Percent encoding is supposed to work differently depending on whether it's the namespace or the name, and this is something that is often confusing. It'd be nice if the spec said exactly which characters were supposed to be escaped when instead of saying which sometimes-escaped characters should not be escaped. The old RFC URL spec is similar with its list of "reserved" characters which might need to be encoded and then listing the characters that do not need to be encoded in different contexts. The newer WHATWG URL spec gives a constructive set of characters that must be encoded and that's much easier to understand. If you have a PURL
|
Absolutely! Just calling everything "percent-encoding" if in fact it's different algorithms is highly confusing indeed. We should name different things differently.
I agree that your interpretation of the spec can be implemented in a way so that coordinate <-> PURL mapping is unambiguous. Still I'm wondering what your main sources of your interpretation of the spec are. I guess mostly the "How to build" pseudo-algorithm description and the test suite, as the written texts are less clear. I'm starting to get convinced that I should change our PURL implementation then to not encode |
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
The steps in my comment are adapted from the parsing section of the spec: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-parse-a-purl-string-in-its-components |
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
fwiw my interpretation re slash encoding is that a slash
I'm doing some work on a PURL-validate UI using the Python implementation, which seems to allow percent-encoding of the slashes -- my current w-i-p UI as well as the public PurlDB validate endpoint (both of which use packageurl-python) find |
These screenshots show the namespace being parsed correctly according to the generic PURL rules. The spec for Go PURLs unfortunately only implies that |
Exactly. While both are valid purls syntax-wise, the former interprets the whole string "github.com/quic-go/quic-go" as the Unfortunately, the purl spec decided otherwise and artificially splits "github.com/quic-go/quic-go" into a "github.com/quic-go" And pretty much the same issue applies to Swift packages as well, which also do not have a native namespace concept. |
I think the unfortunate thing is that PURL has a general concept of namespaces at all. Moving the namespace concept from PURL into the few types that actually have special namespace behavior (eg Maven, apt) can be done without breaking compatibility with any existing PURLs, but keeping the namespace and redefining the Go and Swift types to move the namespace part into the name breaks compatibility with nearly all existing PURLs of those types, and implementations producing and consuming PURLs of those types will need to agree on the correct form in order to interoperate. |
Also see #14.
With the large Maven ecosystem (incl. Gradle, Sbt etc.) and its groups in mind, I beg to differ.
Note that NPM does have a native concept of namespaces (they're called scopes there), and Composer also suggests that the part before the slash is "your-vendor-name" and the part after the slash is "package-name", so it makes sense to treat "your-vendor-name" as a namespace.
Agreed. It is what it is now, and that's why I also proposed over here that the best we can probably do is to properly document that purl treats Go and Swift as if they had namespaces, although they don't have any. |
Maven and Gradle and Sbt are only one case. It doesn't matter if it's a big case or not. The package type is only documented once. NPM does have scopes, but the scope is part of the package ID, and Composer is the same. Unless somebody is trying to assign meaning to the PURL namespace without understanding what it means for that package type, it's inconsequential whether the PURL contains the scope+name or it just has the ID. It only make sense to treat it as the namespace because PURL defines that there is a namespace, and otherwise it would just be the package ID. |
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
Implement the pseudo-algorithm described at [1]. Most importantly, '/' in namespaces are now not escaped anymore (also see the lengthy discussion at [2]), key names are lower-cased, and qualifiers are sorted for comparability. [1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components [2]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
This showcases the current problem with purl encoding for cases where ORT and the purl specification disagree whether a package ecosystem has the concept of namespaces or not. Also see the larger discussion at [1]. [1]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
The purl specification treats everything before the last slash in the string representation of package coordinates as the namespace, and the remainder as the name [1]. Consequently, purl names can never contain slashes, encoded or not. Fixes #8567, fixes #9298. [1]: package-url/purl-spec#176 (comment) Signed-off-by: Sebastian Schuberth <[email protected]>
The purl specification treats everything before the last slash in the string representation of package coordinates as the namespace, and the remainder as the name [1]. Consequently, purl names can never contain slashes, encoded or not. Fixes #8567, fixes #9298. [1]: package-url/purl-spec#176 (comment) Signed-off-by: Sebastian Schuberth <[email protected]>
This showcases the current problem with purl encoding for cases where ORT and the purl specification disagree whether a package ecosystem has the concept of namespaces or not. Also see the larger discussion at [1]. [1]: package-url/purl-spec#176 Signed-off-by: Sebastian Schuberth <[email protected]>
The purl specification treats everything before the last slash in the string representation of package coordinates as the namespace, and the remainder as the name [1]. Consequently, purl names can never contain slashes, encoded or not. Fixes #8567, fixes #9298. [1]: package-url/purl-spec#176 (comment) Signed-off-by: Sebastian Schuberth <[email protected]>
Thank you for this PR @maxhbr . When you have the chance, could you please merge the latest master into this branch? |
I think it is not clear from the text in the README and the examples whether a `/` in the namespace needs to be escaped. This adds an example from `/test-suite-data.json` to the list of examples, to make that clear, that escaping should not be done. A follow-up question, that is not scope of this PR, would be: ============================================================= given the PURL `pkg:swift/github.com%2FAlamofire/[email protected]` is `pkg:swift/github.com/Alamofire/[email protected]` its canonical form? If yes, it should be added to the test cases. Or, for having more fun and with looking at package-url#63 : what is the canonical form of `pkg:golang/github.com%2Frussross%2Fblackfriday%[email protected]`? Signed-off-by: Maximilian Huber <[email protected]>
133e455
to
19f6531
Compare
I've simply rebased the PR from the GitHub UI. |
I think it is not clear from the text in the README and the examples whether a
/
in the namespace needs to be escaped. This adds an example from/test-suite-data.json
to the list of examples, to make that clear, that escaping should not be done.A follow-up question, that is not scope of this PR, would be:
given the PURL
pkg:swift/github.com%2FAlamofire/[email protected]
ispkg:swift/github.com/Alamofire/[email protected]
its canonical form?If yes, it should be added to the test cases.
Or, for having more fun and with looking at #63 : what is the canonical form of
pkg:golang/github.com%2Frussross%2Fblackfriday%[email protected]
?