Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More permissive regex for mimeType #128

Closed
wants to merge 1 commit into from
Closed

More permissive regex for mimeType #128

wants to merge 1 commit into from

Conversation

claylo
Copy link

@claylo claylo commented Sep 25, 2014

The regex used for mimeType is too restrictive. This change allows for values that meet the criteria described in RFC 2045 and RFC 2046.

Hat tip to @mnot for his regex, which he posted here.

The regex used for mimeType is too restrictive. This change allows for
values that meet the criteria described in [RFC
2045](http://tools.ietf.org/html/rfc2045) and [RFC
2046](http://tools.ietf.org/html/rfc2046).

Hat tip to @mnot for his regex, which he [posted
here](http://lists.w3.org/Archives/Public/xml-dist-app/2003Jul/0064.html
).
@fehguy
Copy link
Contributor

fehguy commented Sep 26, 2014

@webron can you merge if this is good with you?

@webron
Copy link
Member

webron commented Sep 27, 2014

Definitely. I just want to look into it more closely, but I'll probably have time only in a few days.
@claylo - it's a bit funny that you find the current one to be too restrictive as it actually allows for malformed mime types. That said, it doesn't mean it allows for the necessary mime type declarations.
Since I'm no expert on mime type definitions, I have to ask, have you read both RFCs?

@claylo
Copy link
Author

claylo commented Sep 27, 2014

@webron I confess to only skimming the two RFCs. I haven't had a chance to dig into them deeply. What I ran into was an inability to specify a produces mediaType of compacted LD-JSON. According to the W3C recommendation, that would be:

application/ld-json; profile=http://www.w3.org/TR/json-ld/#compacted-document-form

The regex my patch replaces chokes on the colon in the profile URI.

@webron
Copy link
Member

webron commented Sep 27, 2014

@claylo - thank you for the clarification. The reason I asked was that I'm looking for the proper way to document what's supported by the spec. I may end up just putting references to both RFCs. Was just hoping for a more human-friendly explanation as RFCs tend to be hyper-technical when you just want to know if what you're doing is valid or not.

@claylo
Copy link
Author

claylo commented Sep 27, 2014

@webron I agree, I have the same problem much of the time. My desire to "be valid" doesn't always permit time to slog through a day's worth of RFC reading. 😀

As I mention in the original pull request, the regex I put in my patch came from @mnot (Mark Nottingham), who is fairly knowledgable in such things as web standards compliance. While I have not done the deep dive in Content-Type related RFCs, I'm confident that Mr. Nottingham has.

@webron
Copy link
Member

webron commented Sep 27, 2014

@claylo - okay, thanks for the info. I imagine there won't be any issues but I want to run it through a few samples just to get the feel of it before I merge it. And really, thank you for taking the time to send this PR.

@claylo
Copy link
Author

claylo commented Sep 27, 2014

@webron My pleasure!

@webron
Copy link
Member

webron commented Oct 10, 2014

Okay, finally got around to play with it. I tested it with http://regex101.com/#javascript.
If I understand correctly, based on @mnot's post, the regex will match everything up until the *(";" parameter) part.
It should also have ^ at the beginning of the regex, but that's easy to fix.

I'm more curious regarding the parameters part which we didn't quite finalize in the spec. I'm looking for some input here.
Are these two content types considered the same:
text/plain
text/plain; charset=utf-8

If I understand correctly, the parameters can be thought of as the same as being request parameter only located in the content type. Does this possibly mean we should introduce a new parameter type instead, allowing setting it with a default value and saying if it's required or not? That way, text/plain is the only valid value, but it has additional possible parameters. This could work well for requests, but I'm not sure how it would work for responses.

I'd love the input of @claylo, @maxlinc, @fehguy and anyone else who may be following this issue.

@maxlinc
Copy link
Contributor

maxlinc commented Oct 10, 2014

I think all valid MIME Types, including those with parameters, should be supported. I'd rather be too permissive than too restrictive. Just like this page explains why a perfect regex may be a bad idea and that "regexes don't send email", its also true that regexes don't parse mime-types. So even a perfect regex (while complicated and slow) will leave gaps - it won't noticed when you write misspell application/json as application/sjon, for example. Actually tools can parse the type and then compare it to a list of known mime-types and spit out a warning if you're using an unknown mime-type (especially one that falls within the standards tree).

I'm :+1 for now for whatever regex is less restrictive (though I lost track of whether that's the original or the PR), but I'd like to consider moving away from regexes and suggest that libraries do better semantic validation.

Longer version

I think the spec should be minimally restrictive. If there's a good reason to disallow content-type parameters than we should. Otherwise, I think we should allow the complete range of valid mime types - including ones with parameters.

I'd still like to see a "Swaggercritic" project one day soon, and still like the paraphrased goal similar to Foodcritic:

Swagger has avoided being overly prescriptive. Having a set of rules to base discussion on helps drive out what we as a community think is good style.

So I think that from a spec perspective, any legal mime-type should be allowed and text/plain and text/plain; charset=utf-8 should be considered distinct values for the purpose of the spec. Individual libraries might not need parameters and might consider them equivalent for their purposes, but they should still be able to see that the API designer explicitly specified both. In fact, I've already opened an issue on the Ruby mime-type because I want to be able to more easily compare mime-types and see if they have the same base-type and/or container-format.

Whether the use of parameters is a best practice or not - I'm not sure yet. But I think that's separate from the spec and why I still support the idea of Swaggercritic and/or "warning" level semantic validation from swagger-tools.

I'm also considered about overusing regexes, especially when they get long. Regexes can be difficult to test and maintain, but I'm also concerned about performance. I'd consider the use of a custom "format" field instead, i.e.:

"mimeType": {
  "type": "string",
  "format": "rfc_2045",
  "description": "The MIME type of the HTTP message."
}

This may work well as long as:

  • The main Swagger libraries implement support for the custom formats (via hooks in their JSON-Schema validator)
  • The code to parse/validate the string is often simpler than the Regex (and/or a standard library exists for the format)

Third-party libraries might not be as restrictive as the core tools if they don't implement support for the custom formats, allowing any string. So a good compromise may be to use:

  • A simple regex that catches most mistakes
  • A custom format, so implementors may choose to implement their own perfect, faster validation that may or may-not be based on a regex.

wting pushed a commit to wting/swagger-spec that referenced this pull request Oct 21, 2014
@webron
Copy link
Member

webron commented Oct 27, 2014

I admit that I'm inclined to simplify the regex to the point of removing it entirely. I'm less concerned with the validity of the values and more concerned with the validity of the structure of the spec file itself.

We can suggest a regex that libraries may use if they want to issue additional warnings to the end user.

If spec consumers break because of wrong values, that's an issue for the producer (though consumers may want to incorporate error handling for it).

@claylo
Copy link
Author

claylo commented Oct 27, 2014

@webron +1 on no regex. Mime types are an area where there is likely to be the most variability over the coming years, so having no regex at all will avoid future tickets like this one. 😏

@maxlinc
Copy link
Contributor

maxlinc commented Oct 27, 2014

So we're all agreed on dropping the "pattern" with the regex. Should a "format" be added in its place?

As far as I know all json-schema validators will ignore unknown formats (understanding json-schema says so as well), so it should effectively act as little more than documentation - and a "validation extension point" for validators that support custom format handlers. And then it's easy to add an appendix to the spec for "custom formats used in the spec", with links to related specifications and/or suggested regexes that can be used if a more efficient validator isn't available, as @webron suggested.

@webron
Copy link
Member

webron commented Oct 27, 2014

I think it's an interesting idea, but one we should visit in a future version of the spec. Right now, we're making the last steps into sealing it, and I believe this change may require changes to existing specs.

Keep in mind that mime-types are used in several places across the spec - not only as values for produces/consumes fields but also as keys for responses. It's also somewhat related to #146 and we may need to come up with a solution that satisfies the more common needs (though not necessarily all).

@webron
Copy link
Member

webron commented Jul 30, 2015

mimetype regex has been completely removed. Thanks for taking the time and providing the input everyone.

@webron webron closed this Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants