-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve #deterministically_verify
helper
#2828
Improve #deterministically_verify
helper
#2828
Conversation
…en :random is provided
…:Config.random state on each :depth iteration
…ns on a generated value against itself
…helper implementation
d345583
to
48771fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great contribution. Thank you @erichmachado for the detailed description.
We started using this helper more a few weeks ago, so probably that put this helper to real test. I liked how it reads much better now.
LGTM. I'd like to get a review from @thdaraujo as well.
# @param depth [Integer] the depth of deterministic comparisons to run. | ||
# @param random [Integer] A random number seed; Used to override the default. | ||
# @param depth [Integer] the depth of deterministic comparisons to run; the default value is 2. | ||
# @param seed [Integer] A random number seed; Used to override the default value which is 42. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I like that name. It says more about it is.
Thank you for your time and for your review @stefannibrasil. No problem, let's wait for @thdaraujo's review as well. 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @erichmachado for the great contribution and the in-depth description. I agree with most of your points, and I think this will greatly improve this helper method.
I also agree that we should prevent generators from changing the random instance, but I think we still want to catch those things. So I think instead of stubbing it, we should do something different. I left some questions and suggestions, let me know what you think!
Thanks!
raise 'need block' unless block_given? | ||
def deterministically_verify(subject_proc, depth: 2, seed: 42) | ||
results = depth.times.map do | ||
Faker::Config.stub :random, Random.new(seed) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.
I think the desired outcome is that generators should never override/change the random generator during their execution, and we want to have an easy way to assert that with deterministically_verify
.
In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify
would allow them to pass. Does that make sense?
Maybe a different approach could be to set the random instance like you're doing it here. And them maybe stub the setter Faker::Config.random=
to raise an error if any generator tries to override/change the random generator during the iteration.
Or, maybe an even simpler approach would be to check that the seed is the same before and after the iteration to make sure the generator never changes it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a great point! +1 to check the seed is the same before and after the iteration.
Abut checking this on multiple threads: @erichmachado are you up for working on that as well? There are these tests that might be helpful: https://github.com/faker-ruby/faker/blob/main/test/test_default_locale.rb and https://github.com/faker-ruby/faker/blob/main/test/faker/default/test_faker_unique_generator.rb Otherwise, we can create a new issue for that. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thdaraujo, thank you for your time and your review. Much appreciated!
I'll try to explain the rationale behind it below. Please, let me know if you still have questions.
My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.
Guaranteeing that every internal depth
iteration has the same random sequence ("instance") is actually achieved by ensuring that each one of those iterations starts with a fresh Random
instance initialized with the same seed
value. This is sufficient to ensure that the RNG sequence will be replayed correctly on each cycle.
The motivation to iterate on depth
on the initial implementation apparently was to ensure that the code under test was using a deterministic RNG algorithm, since iterating multiple times with the same seed should generate the same output in this case - and I believe this is sound.
The problem was that before we were using a memoized random
instance whenever the random:
parameter (which was renamed as seed:
now) was set; and this was preventing the RNG from generating the same output across depth
iterations, because it had already started the sequence in the previous cycles:
Faker::Config.random = random || Random.new(42)
Which we could have fixed with an update like this:
Faker::Config.random = Random.new(seed) || Random.new(42)
And if we assign 42
as a default value for seed
on the method signature it could then be abbreviated to:
Faker::Config.random = Random.new(seed)
But here lies another issue: without stubbing, the Faker::Config.random
state will be touched every time a depth
iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).
Stubbing guarantees that every internal depth
iteration will have access to its own Random
instance without any side-effect to the global Faker::Config.random
state, which addresses that issue.
For example, it's possible to verify that the Faker::Config.random
state is leaking across specs with a simple check, as described on item 3 above (replicating it here for reference):
- It does not protect the existing
Faker::Config.random
state from being mutated across#deterministic_verify
runs.
- It's also easy to verify this, just place a debugger call before and after each
#deterministic_verify
block and inspect the value ofFaker::Config.random
around it.
Ideally, just set the Faker::Config.random
to a known value before a #deterministic_verify
block call on an existing spec and check which value it will have after it.
I assume we don't want it to mutate just because the helper was called, even though we expect the code exercised inside the block to have access to the alternate RNG while running.
In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify would allow them to pass. Does that make sense?
Based on the analysis above, I believe this change will add to the protection in this direction, but not for the generators (library) code but rather for the specs themselves.
If we want to ensure that library code is not mutating global state, then I recommend relying on a partial double assertion on Faker::Config.random=
or an around(:each)
catch-all spec instead, in order to guard against updates to Faker::Config.random
- like you've mentioned above. 👍 (and I'm happy to help with another contribution there if you like)
In summary, both approaches would complement each other, from my understanding. One is protecting the Faker::Config.random
from unintended mutation on the library code itself while the other is protecting against unintended mutations during spec runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, now I get it. Thanks for the in-depth explanation!
I haven't considered this part when I was reading your proposal:
But here lies another issue: without stubbing, the Faker::Config.random state will be touched every time a depth iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).
Stubbing guarantees that every internal depth iteration will have access to its own Random instance without any side-effect to the global Faker::Config.random state, which addresses that issue.
And if you're interested in working on the guard against generators changing the Faker::Config.random
, that would be super cool. Thanks!
(by the way, this also reminds me that we should probably create a similar verifier for generators running on different threads that could verify they all return the same values for the same given random seed. But that's out of scope for this discussion. I think this would have helped when we were working on #1700 ) cc: @stefannibrasil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, thank you!
results << subject_proc.call.freeze.tap(&block) | ||
end.repeated_combination(2) { |(first, second)| assert_equal first, second } | ||
# rubocop:enable Style/MultilineBlockChain | ||
results.combination(2) { |(first, second)| assert_equal first, second } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! 😎
#deterministically_verify
helper improvements#deterministically_verify
helper
Motivation / Background
This Pull Request has been created because I was trying to fix a test using the
#deterministic_verify
helper with therandom
parameter set and saw that my test was failing with the defaultdepth
value (2).I then realized a few issues with the current implementation:
Integer
value forrandom
while it expects aRandom
instance internally.Random#rand
iteration acrossdepth
iterations if a customrandom
value is set.depth > 1
fail the internal assertion due to different values being generated acrossdepth
iterations.random
to a custom value on any existingdeterministic_verify
block on the codebase.Faker::Config.random
state from being mutated across#deterministic_verify
runs.#deterministic_verify
block and inspect the value ofFaker::Config.random
around it.I've addressed those issues above by:
Integer
through aseed
parameter instead of aRandom
instance throughrandom
.seed
is set then it will always use a newRandom
instance before callingsubject_proc
.random#rand
iteration for each internaldepth
iteration.random
(before) /seed
(now) is not set, thus making those iterations generate the same values (as expected).MiniTest
'sObject#stub
to prevent mutating theFaker::Config.random
state insidedepth
iterations.I've also included a few other minor improvements:
depth
iteration anymore (this was a waste of cycles).inject
withmap
since there was (apparently) no benefit in using the earlier after the changes above.yield
instead of manually calling&block
. The reason is that we get an automaticLocalJumpError
if no block is provided, so it was possible to remove the manualraise
from the implementation.Additional information
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]
If you're proposing a new generator: