Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested tuples reduce diversity of samples #507

Closed
amieres opened this issue Sep 15, 2019 · 8 comments
Closed

Nested tuples reduce diversity of samples #507

amieres opened this issue Sep 15, 2019 · 8 comments

Comments

@amieres
Copy link

amieres commented Sep 15, 2019

When generating samples with many unary nested types (like unit) the samples become highly repetitive:

(fun () () () () () () v s () () () () () () -> printfn "%d %A" v (s:string) ; true)
|> Check.Quick

Produces great samples:

-2 ""
1 "w"
1 "�b"
3 "`>&1<"
6 "q{J2�"
6 "[`�[*�j"
-4 "�E-"
2 "i.zp�"
...

But when tupled a little:

(fun ((((), ()), ()), (), (), (), v, s, (), (), (), (), (), ()) -> printfn "%d %A" v (s:string) ; true)
|> Check.Quick

produces a degraded sample:

0 ""
0 ""
0 ""
...
0 "8"   <---- sample 24
1 "u"
0 "+"
-1 ""
1 ""
1 "?"
-1 "j"
...

Fully nested tuples:

(fun ((((((((((((((), ()), ()), ()), ()), ()), v), s), ()), ()), ()), ()), ()), ()) -> printfn "%d %A" v (s:string) ; true)
|> Check.Quick

And all 100 samples are like these:

0 ""
0 <null>
0 ""
...
@Kharacternyk
Copy link
Contributor

I'm not familiar with the internals of the default tuple generator, but I think that it divides a size between elements of a tuple. For example, if an initial size is 100 then elements of a 2-tuple will be (roughly) of size 50, and if these elements are also 2-tuples, then their elements will be of size 25 and so on. Thus until v and s are actually generated, the size has become much smaller than the initial one (in your third example it looks like the size is zero).

@amieres
Copy link
Author

amieres commented Sep 15, 2019

It makes sense. I guess it would need to have a check for types with just one value so it doesn't halve the size.

@kurtschelfthout
Copy link
Member

What @Kharacternyk says is exactly correct.

This is a somewhat pessimistic strategy to keep the total size of the outer tuple under the max desired size. You are right however that the actual used size in the tuple elements is not fed back to the outer tuple generator, because most types' size is effectively large enough for it not to be a concern. I.e. to generate an int * string of size 50, generate an int of size 25 and a string of size 25 and you'll regularly get interesting examples, some of which will actually have size 50. But if you generate unit * bool of size 50, you're only going to get (),true or (),false and so in that sense dividing the size equally among tuple elements can result in some "wasted" size, as you've observed.

@kurtschelfthout
Copy link
Member

To follow up on that, to address this the tuple generator (or really any composite generator) would have to somehow be able to calculate the max usable size of its elements. And then use that to divide size. There are ways to do that - by writing combinators for "enumerable structures" with primitives much like the ones FsCheck generators have, but that allow to calculate properties of the resulting structures, such as their cardinality, there are techniques to efficiently generates an element of a given size with a guaranteed uniform distribution and so on.

If I'd ever have time to write a new generation of FsCheck, that's where I'd start.

A decent entry point is https://lara.epfl.ch/~kuncak/papers/KurajETAL15ProgrammingEnumerableSetsStructures.pdf and anything about combinatorics, esp. Species.

@kurtschelfthout
Copy link
Member

Closing as dup of #507

@amieres
Copy link
Author

amieres commented Dec 28, 2020

It's a dup of itself?
Quick question:
why does the default generator for a function with multiple parameters perform better than for tuples?
could that generator be used for tuples somehow?

@kurtschelfthout
Copy link
Member

Dup of #444

@kurtschelfthout
Copy link
Member

Quick question:
why does the default generator for a function with multiple parameters perform better than for tuples?
could that generator be used for tuples somehow?

That's not a quick question :) or even one I know the answer to. Open an issue with a repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants