You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See for instance tcort/markdown-link-check#155
User is looking to check https://en.wikipedia.org/wiki/%3F:
But the ? sign is then decoded by link-check and not reencoded as it's a legit char for a url but that have a specific meaning of being the param part start. So https://en.wikipedia.org/wiki/%3F: becomes https://en.wikipedia.org/wiki/?: meaning https://en.wikipedia.org/wiki/ with parameter :.
We need to find a way to deal with encoded chars without all these issues. There's already a lot of tests for some specific cases in the test suite. They have to continue to work, but for now the risk is to have a pile of specific cases. Finding a generic way to deal with this would be nice.
The text was updated successfully, but these errors were encountered:
diff --git a/lib/proto/http.js b/lib/proto/http.js
index f6530a4..548e7a8 100644
--- a/lib/proto/http.js+++ b/lib/proto/http.js@@ -31,13 +31,8 @@ module.exports = {
let user_agent = opts.user_agent || `${pkg.name}/${pkg.version}`;
- // Decoding and encoding is required to prevent encoding already encoded URLs- // We decode using the decodeURIComponent as it will decode a wider range of - // characters that were not necessary to be encoded at first, then we re-encode- // only the required ones using encodeURI.- // Note that we don't use encodeURIComponents as it adds too much non-necessary encodings- // see "Not Escaped" list in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent#description- const url = encodeURI(decodeURIComponent(new URL(link, opts.baseUrl).toString()));+ // rebase relative urls and normalize url encoding+ const url = new URL(link, opts.baseUrl).toString();
const options = {
user_agent: user_agent,
Alternatively adding an option parameter to skip the reencoding would be highly appreciated. That way an per URL option could be added to the markdown-link-check config to disable this 'feature' for problematic URLs.
See for instance tcort/markdown-link-check#155
User is looking to check
https://en.wikipedia.org/wiki/%3F:
But the
?
sign is then decoded by link-check and not reencoded as it's a legit char for a url but that have a specific meaning of being the param part start. Sohttps://en.wikipedia.org/wiki/%3F:
becomeshttps://en.wikipedia.org/wiki/?:
meaninghttps://en.wikipedia.org/wiki/
with parameter:
.We need to find a way to deal with encoded chars without all these issues. There's already a lot of tests for some specific cases in the test suite. They have to continue to work, but for now the risk is to have a pile of specific cases. Finding a generic way to deal with this would be nice.
The text was updated successfully, but these errors were encountered: