Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve #deterministically_verify helper #2828

19 changes: 9 additions & 10 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,20 @@
# times with the same deterministic_random seed.
# @param subject_proc [Proc] a proc object that returns the subject under test
# when called.
# @param depth [Integer] the depth of deterministic comparisons to run.
# @param random [Integer] A random number seed; Used to override the default.
# @param depth [Integer] the depth of deterministic comparisons to run; the default value is 2.
# @param seed [Integer] A random number seed; Used to override the default value which is 42.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like that name. It says more about it is.

#
# @example
# deterministically_verify ->{ @tester.username('bo peep') } do |subject|
# assert subject.match(/(bo(_|\.)peep|peep(_|\.)bo)/)
# end
#
def deterministically_verify(subject_proc, depth: 2, random: nil, &block)
raise 'need block' unless block_given?
def deterministically_verify(subject_proc, depth: 2, seed: 42)
results = depth.times.map do
Faker::Config.stub :random, Random.new(seed) do
Copy link
Contributor

@thdaraujo thdaraujo Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.

I think the desired outcome is that generators should never override/change the random generator during their execution, and we want to have an easy way to assert that with deterministically_verify.

In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify would allow them to pass. Does that make sense?

Maybe a different approach could be to set the random instance like you're doing it here. And them maybe stub the setter Faker::Config.random= to raise an error if any generator tries to override/change the random generator during the iteration.

Or, maybe an even simpler approach would be to check that the seed is the same before and after the iteration to make sure the generator never changes it. What do you think?

cc @stefannibrasil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a great point! +1 to check the seed is the same before and after the iteration.

Abut checking this on multiple threads: @erichmachado are you up for working on that as well? There are these tests that might be helpful: https://github.com/faker-ruby/faker/blob/main/test/test_default_locale.rb and https://github.com/faker-ruby/faker/blob/main/test/faker/default/test_faker_unique_generator.rb Otherwise, we can create a new issue for that. Thanks!

Copy link
Contributor Author

@erichmachado erichmachado Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thdaraujo, thank you for your time and your review. Much appreciated!

I'll try to explain the rationale behind it below. Please, let me know if you still have questions.

My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.

Guaranteeing that every internal depth iteration has the same random sequence ("instance") is actually achieved by ensuring that each one of those iterations starts with a fresh Random instance initialized with the same seed value. This is sufficient to ensure that the RNG sequence will be replayed correctly on each cycle.

The motivation to iterate on depth on the initial implementation apparently was to ensure that the code under test was using a deterministic RNG algorithm, since iterating multiple times with the same seed should generate the same output in this case - and I believe this is sound.

The problem was that before we were using a memoized random instance whenever the random: parameter (which was renamed as seed: now) was set; and this was preventing the RNG from generating the same output across depth iterations, because it had already started the sequence in the previous cycles:

Faker::Config.random = random || Random.new(42)

Which we could have fixed with an update like this:

Faker::Config.random = Random.new(seed) || Random.new(42)

And if we assign 42 as a default value for seed on the method signature it could then be abbreviated to:

Faker::Config.random = Random.new(seed)

But here lies another issue: without stubbing, the Faker::Config.random state will be touched every time a depth iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).

Stubbing guarantees that every internal depth iteration will have access to its own Random instance without any side-effect to the global Faker::Config.random state, which addresses that issue.

For example, it's possible to verify that the Faker::Config.random state is leaking across specs with a simple check, as described on item 3 above (replicating it here for reference):

  1. It does not protect the existing Faker::Config.random state from being mutated across #deterministic_verify runs.
    1. It's also easy to verify this, just place a debugger call before and after each #deterministic_verify block and inspect the value of Faker::Config.random around it.

Ideally, just set the Faker::Config.random to a known value before a #deterministic_verify block call on an existing spec and check which value it will have after it.

I assume we don't want it to mutate just because the helper was called, even though we expect the code exercised inside the block to have access to the alternate RNG while running.

In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify would allow them to pass. Does that make sense?

Based on the analysis above, I believe this change will add to the protection in this direction, but not for the generators (library) code but rather for the specs themselves.

If we want to ensure that library code is not mutating global state, then I recommend relying on a partial double assertion on Faker::Config.random= or an around(:each) catch-all spec instead, in order to guard against updates to Faker::Config.random - like you've mentioned above. 👍 (and I'm happy to help with another contribution there if you like)

In summary, both approaches would complement each other, from my understanding. One is protecting the Faker::Config.random from unintended mutation on the library code itself while the other is protecting against unintended mutations during spec runs.

Copy link
Contributor

@thdaraujo thdaraujo Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, now I get it. Thanks for the in-depth explanation!

I haven't considered this part when I was reading your proposal:

But here lies another issue: without stubbing, the Faker::Config.random state will be touched every time a depth iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).

Stubbing guarantees that every internal depth iteration will have access to its own Random instance without any side-effect to the global Faker::Config.random state, which addresses that issue.

And if you're interested in working on the guard against generators changing the Faker::Config.random, that would be super cool. Thanks!

yield subject_proc.call.freeze
end
end

# rubocop:disable Style/MultilineBlockChain
depth.times.inject([]) do |results, _index|
Faker::Config.random = random || Random.new(42)
results << subject_proc.call.freeze.tap(&block)
end.repeated_combination(2) { |(first, second)| assert_equal first, second }
# rubocop:enable Style/MultilineBlockChain
results.combination(2) { |(first, second)| assert_equal first, second }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! 😎

end