Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generators for URLs and domain names #18

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

wfscheper
Copy link

@wfscheper wfscheper commented Nov 1, 2020

This adds an RFC 1035 compliant domain name generator and an RFC 3986 compliant URL generator.

Closes #17

Signed-off-by: Walter Scheper [email protected]

@flyingmutant
Copy link
Owner

Thanks a lot for the PR! I'll try to review in the next couple of days. Can you please add your copyright on top of the new files?

@wfscheper
Copy link
Author

wfscheper commented Nov 3, 2020

Will do. I also realized I should add Examples, so I'll push them as well.

Copy link
Owner

@flyingmutant flyingmutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Finally managed to add some comments.

The thing I am concerned with the most is that only a tiny subset of possible URLs is generated. In general, I want rapid generators to produce every possible kind of valid data, to make sure that it is easy to catch overlooked corner cases in the things tested with rapid.

tld.go Outdated
ZW
`

var tlds = strings.Split(tldsByAlpha, "\n")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and last elements of tlds are blank strings, is this intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, no I need to runtldsByAlpha through TrimSpace.

url.go Outdated
assertf(4 <= maxLength, "maximum length (%v) should not be less than 4, to generate a two character domain and a one character subdomain", maxLength)
assertf(maxLength <= 255, "maximum length (%v) should not be greater than 255 to comply with RFC 1035", maxLength)
assertf(1 <= maxElementLength, "maximum element length (%v) should not be less than 1 to comply with RFC 1035", maxElementLength)
assertf(maxElementLength <= 63, "maximum element length (%v) should not be greater than 63 to comply with RFC 1035", maxElementLength)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing DomainOf entirely for now, and using named constants for 255 and 63.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

url.go Outdated
}

func (g *domainNameGen) String() string {
return fmt.Sprintf("Domain(maxLength=%v, mmaxElementLength%v)", g.maxLength, g.maxElementLength)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to make String() close to the code using to construct the generator, so after removing DomainOf it should just return Domain().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

url.go Outdated

var b strings.Builder
elements := newRepeat(1, 126, 1)
b.Grow(elements.avg())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b is unused below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was me cargo-culting from another implementation that used a repeat. I'll clean it up.

domain = subDomain + "." + domain
}

return domain
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 1035 specifies that <domain> ::= <subdomain> | " ", should we support domain being space or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's in the spec, then we probably should, but I don't really grasp what " " as a domain even means. Guess I'll go back and re-read the spec.

)

func TestURL(t *testing.T) {
pathEscapeRegex := regexp.MustCompile(`^[0-9A-Fa-f]{2}`)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to compile at the top-level; also t.Parallel() is missing for this test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

url.go Outdated

func (g *urlGenerator) value(t *T) value {
scheme := SampledFrom(g.schemes).Draw(t, "scheme").(string)
domain := Domain().Draw(t, "domain").(string)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we generate URLs with host component containing an IP (v4 or v6) address as well? That does not sound easy (especially given the IPv6 syntax variants), but otherwise we are omitting quite an important URL subset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If I add those options, should I do so via public IPv4 and IPv6 Generators?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good. Maybe a single generator of net.IP with an option like v6 bool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the IPv4 and IPv6 generators is pretty straightforward, but I'm not sure how to incorporate them into the URL generator.

I can think of three ways to do this:

  • One generator, URL(), that generates both DNS URLs and IP URLs
  • Two generators, URL() that generates DNS URLs and URLOf(schemes []string, domain *Generator).
  • Three generators, URL(), URLIPv4(), and URLIPv6()`

I think you're looking for the first option, but wanted to float the other two and see what you think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going with just URL() for now is the safest option (because it can be extended in the future based on real feedback) -- in other cases we are guessing/speculating a bit about the use cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! I hope I'll be able to review/merge all this before the end of the week.

url.go Outdated
func (g *urlGenerator) value(t *T) value {
scheme := SampledFrom(g.schemes).Draw(t, "scheme").(string)
domain := Domain().Draw(t, "domain").(string)
port := IntRange(0, 2^16-1).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC does not limit the port range.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that the RFC is refraining from specifying something that's out of the scope of the standard, rather than positively asserting that there is no limit on the range of valid ports.

Your call, but I think limiting the generation to the standard 16-bit valid port range is reasonable.

StringOf(RuneFrom(nil, unicode.PrintRanges...)).Map(url.PathEscape),
).Draw(t, "path").([]string)...)

return url.URL{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not generating query or fragment parts, seems like quite a big omission.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but wasn't sure what to do.

With the very simple API currently exposed, there's no configurability of whether or not to generate query parameters or fragments. I suppose the consumer could use a Filter or Map to strip the parts they don't want. What are your thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, too -- but client-side stripping (if required) sounds like an OK option.

// Z.CU
// r.ABBotT
// r.AcCoUNTaNT
// R.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R. is probably not what we wanted to generate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably coming from the two empty string entries that are incorrectly included.

@flyingmutant
Copy link
Owner

By the way, any idea how to make CI work for this PR? It seems stuck.

@wfscheper
Copy link
Author

wfscheper commented Nov 11, 2020

I don't think anything is being scheduled for the PR, but I'm not sure why.

Edit: just looked at the workflow file, and there's no trigger for pull_requests. I can add that if you want. Here's one of my projects for comparision: https://github.com/wfscheper/stentor/blob/main/.github/workflows/build.yml

@flyingmutant
Copy link
Owner

Shouldn't "on (any) push" be enough?

Also, can you please try rebasing on top of current master? Maybe it can help CI to unstuck itself as well.

@wfscheper
Copy link
Author

wfscheper commented Nov 12, 2020 via email

@flyingmutant
Copy link
Owner

I've pushed the on pull_request change to the CI config, and created #20 to test it. Looks like CI change is working -- but the CI itself is failing on Go 1.13.

Can you please rebase this PR on top of current master and look into the CI failure? I think once the CI is happy I'll merge the PR, and fix the nitpicks (if I'll find something else to nitpick, that is) afterwards.

@wfscheper
Copy link
Author

I'm at a loss on how to address the 1.13 issue. I think I may need to install go 1.13 and try stepping through the example with delve. I should have some time tonight.

@wfscheper
Copy link
Author

Hey, apologies for leaving this hanging for so long. I see that you added some build tags to try and get around the example tests not lining up correctly. I also added a bash script to update the tld constant, which was out of date. Something to wire into a github action, perhaps?

@flyingmutant
Copy link
Owner

No problem, rapid is not changing too rapidly :-) I use build tags mainly to work around Unicode database updating from release to release.

Something to wire into a github action, perhaps?

I think just doing manual update every couple of releases might be good enough for now. I don't want to complicate the automation too much.

wfscheper and others added 10 commits September 24, 2022 21:29
This adds a Generator for RFC 3986 compliant URLs, and two generators
for RFC 1035 compliant domain names.

Closes flyingmutant#17

Signed-off-by: Walter Scheper <[email protected]>
Signed-off-by: Walter Scheper <[email protected]>
Need to strip the leading and trailing newlines so we don't get empty
strings in our top-level domain list.
This removes the configurable domain generation in favor of one that
generates domains within the full rnage allowed by the specification.
To make the generated URL conform to how `url.Parse` works, Path
and Fragment are set to the unescaped form. RawPath and RawFragment
are left unset, and we simply rely on the URL struct to handle the
escaping as needed.
These generators return `net.IP`, and are also used in generating URLs.
This adds a simple shell script to update the tld.go constant with the
latest contents of https://data.iana.org/TLD/tlds-alpha-by-domain.txt
@wfscheper
Copy link
Author

Hello! GitHub randomly reminded me about this abandoned PR of mine. Thought I'd take a bit and update it, and was pleasantly surprised by the new API using generics. Awesome stuff!

Only hiccup I had was the loss of the old Generator.Map() functionality. Is there an equivalent that I missed?

@flyingmutant
Copy link
Owner

Thought I'd take a bit and update it, and was pleasantly surprised by the new API using generics. Awesome stuff!

Yep, I was pleasantly surprised how little changes (conceptual, not LoC) were required to make rapid fully type-safe.

Only hiccup I had was the loss of the old Generator.Map() functionality. Is there an equivalent that I missed?

Methods can't have their own generic parameters, because of that Generator.Map() became top-level Transform.

ip_addr.go Outdated
return fmt.Sprintf("IP(ipv6=%v)", g.ipv6)
}

func (g *ipGen) value(t *T) net.IP {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is good idea to use netip? With generics we already require Go 1.18.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.


// String generation depends on the Unicode tables, which change with Go versions:
//go:build go1.16
// +build go1.16
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, now we require Go 1.18 anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

This also removes the unnecessary build tag in the example tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add URL and DomainName generators
2 participants