-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More permissive regex for mimeType #128
Conversation
The regex used for mimeType is too restrictive. This change allows for values that meet the criteria described in [RFC 2045](http://tools.ietf.org/html/rfc2045) and [RFC 2046](http://tools.ietf.org/html/rfc2046). Hat tip to @mnot for his regex, which he [posted here](http://lists.w3.org/Archives/Public/xml-dist-app/2003Jul/0064.html ).
@webron can you merge if this is good with you? |
Definitely. I just want to look into it more closely, but I'll probably have time only in a few days. |
@webron I confess to only skimming the two RFCs. I haven't had a chance to dig into them deeply. What I ran into was an inability to specify a
The regex my patch replaces chokes on the colon in the profile URI. |
@claylo - thank you for the clarification. The reason I asked was that I'm looking for the proper way to document what's supported by the spec. I may end up just putting references to both RFCs. Was just hoping for a more human-friendly explanation as RFCs tend to be hyper-technical when you just want to know if what you're doing is valid or not. |
@webron I agree, I have the same problem much of the time. My desire to "be valid" doesn't always permit time to slog through a day's worth of RFC reading. 😀 As I mention in the original pull request, the regex I put in my patch came from @mnot (Mark Nottingham), who is fairly knowledgable in such things as web standards compliance. While I have not done the deep dive in Content-Type related RFCs, I'm confident that Mr. Nottingham has. |
@claylo - okay, thanks for the info. I imagine there won't be any issues but I want to run it through a few samples just to get the feel of it before I merge it. And really, thank you for taking the time to send this PR. |
@webron My pleasure! |
Okay, finally got around to play with it. I tested it with http://regex101.com/#javascript. I'm more curious regarding the parameters part which we didn't quite finalize in the spec. I'm looking for some input here. If I understand correctly, the parameters can be thought of as the same as being request parameter only located in the content type. Does this possibly mean we should introduce a new parameter type instead, allowing setting it with a default value and saying if it's required or not? That way, I'd love the input of @claylo, @maxlinc, @fehguy and anyone else who may be following this issue. |
I think all valid MIME Types, including those with parameters, should be supported. I'd rather be too permissive than too restrictive. Just like this page explains why a perfect regex may be a bad idea and that "regexes don't send email", its also true that regexes don't parse mime-types. So even a perfect regex (while complicated and slow) will leave gaps - it won't noticed when you write misspell I'm :+1 for now for whatever regex is less restrictive (though I lost track of whether that's the original or the PR), but I'd like to consider moving away from regexes and suggest that libraries do better semantic validation. Longer version I think the spec should be minimally restrictive. If there's a good reason to disallow content-type parameters than we should. Otherwise, I think we should allow the complete range of valid mime types - including ones with parameters. I'd still like to see a "Swaggercritic" project one day soon, and still like the paraphrased goal similar to Foodcritic:
So I think that from a spec perspective, any legal mime-type should be allowed and Whether the use of parameters is a best practice or not - I'm not sure yet. But I think that's separate from the spec and why I still support the idea of Swaggercritic and/or "warning" level semantic validation from swagger-tools. I'm also considered about overusing regexes, especially when they get long. Regexes can be difficult to test and maintain, but I'm also concerned about performance. I'd consider the use of a custom "format" field instead, i.e.:
This may work well as long as:
Third-party libraries might not be as restrictive as the core tools if they don't implement support for the custom formats, allowing any string. So a good compromise may be to use:
|
Fix the type ref
I admit that I'm inclined to simplify the regex to the point of removing it entirely. I'm less concerned with the validity of the values and more concerned with the validity of the structure of the spec file itself. We can suggest a regex that libraries may use if they want to issue additional warnings to the end user. If spec consumers break because of wrong values, that's an issue for the producer (though consumers may want to incorporate error handling for it). |
@webron +1 on no regex. Mime types are an area where there is likely to be the most variability over the coming years, so having no regex at all will avoid future tickets like this one. 😏 |
So we're all agreed on dropping the "pattern" with the regex. Should a "format" be added in its place? As far as I know all json-schema validators will ignore unknown formats (understanding json-schema says so as well), so it should effectively act as little more than documentation - and a "validation extension point" for validators that support custom format handlers. And then it's easy to add an appendix to the spec for "custom formats used in the spec", with links to related specifications and/or suggested regexes that can be used if a more efficient validator isn't available, as @webron suggested. |
I think it's an interesting idea, but one we should visit in a future version of the spec. Right now, we're making the last steps into sealing it, and I believe this change may require changes to existing specs. Keep in mind that mime-types are used in several places across the spec - not only as values for |
mimetype regex has been completely removed. Thanks for taking the time and providing the input everyone. |
The regex used for mimeType is too restrictive. This change allows for values that meet the criteria described in RFC 2045 and RFC 2046.
Hat tip to @mnot for his regex, which he posted here.