-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An api for stateful properties and model checking #94
Comments
Thanks for the write up @keathley! I am not that familiar with stateful checking so please take my feedback with a grain of salt. The examples above seem to have a lot of repetition and I don't like that. The ScalaCheck approach probably helps remove this repetition but imposing a module per command would not be very idiomatic. As I hinted on IRC, I would prefer each approach to rely on data structures and function composition. Here is how I would rewrite the model bit with this in mind: def command(%{conn: :empty}) do
# A simple command with the atom ok as constant generator.
:ok
|> command(fn :ok -> start_redis() end)
|> next_state()
end
def command(%{conn: conn, contents: contents}) do
keys = Map.keys(contents)
# preconditions, next state and post conditions are attached to
# commands only if necessary.
get =
one_of(keys)
|> command(fn key -> Redix.command(conn, ["GET", key]) end)
|> precondition(&continue_if_not_empty_conn/2)
|> post_condition(fn state, key, result -> Map.fetch(state.contents, key) == result end)
set =
{binary(), binary()}
|> command(fn {key, value} -> Redix.command(conn, ["SET", key, value]) end)
|> next_state(&store_key/3)
|> post_condition(fn _state, _key, result -> result == {:ok, "OK"} end)
set_existing =
{one_of(keys), binary()}
|> command(fn {key, value} -> Redix.command(conn, ["SET", key, value]) end)
|> next_state(&store_key/3)
|> post_condition(fn _state, _key, result -> result == {:ok, "OK"} end)
one_of([get, set, set_existing])
end
defp continue_if_not_empty_conn(%{conn: conn}, _key), do: conn != :empty
defp store_key(state, {key, value}, _result), do: put_in(state.contents[key], value) This is not a complete rewrite but it should get the discussion moving. Because it is just function and data, we can pass around commands and build them in many different ways. Running the commands is a matter of passing the initial state and a the reference to function that generates the commands, in this case, Some remaining topics are discussed below. preconditions vs stateOne of the questions I have is why do we pre_conditions if the commands we generate are already tailored to the given state? For example, would we ever reach the precondition below, given we don't generate Redix commands when the conn is empty? def precondition(%{conn: :empty}, {:call, Redix, _, _}), do: false What about symbolic variables?Given we put the execution of the code inside closures, there is a chance we no longer need symbolic variables (again, grain of salt here, I likely don't know what I am talking about). But in a nutshell, if a command needs to store or read something previously stored, we could simply pass the state to the comamnd function: one_of(keys)
|> command(fn state, key ->
result = Redix.command(conn, ["GET", key]) end)
{result, put_in(state.temp1, result)}
end) I am not sure if storing symbolic variables in the state is a good idea because they may have different livecycles. If storing it in the state is a bad idea, we could introduce something such as a session, that would have something closer to the semantic variables lifecycle. Somebody with more understanding has to chime in here with the semantics and the operations that need access to semantic variables. DebuggingOne of the nice things about symbolic calls is that it is easy to generate a trace in case the test fails. For example: [{set,{var,1},{call,proc_reg_eqc,spawn,[]}},
{set,{var,2},{call,proc_reg,where,[c]}},
{set,{var,3},{call,proc_reg_eqc,spawn,[]}},
{set,{var,4},{call,proc_reg_eqc,kill,[{var,1}]}},
{set,{var,5},{call,proc_reg,where,[d]}},
{set,{var,6},{call,proc_reg_eqc,reg,[a,{var,1}]}},
{set,{var,7},{call,proc_reg_eqc,spawn,[]}}] In the proposal above, all of the operations are hidden behind anonymous functions, which means we don't get this understanding. We don't even have a name we can refer to the commands. One possible solution is to require each command to have a name. Something like: :get
|> command(one_of(keys), fn key -> Redix.command(conn, ["GET", key]) end)
|> ... Now, in case of failures, we can say the command name and generated args: [
{:get, "foo"},
{:set, {"bar", "baz}}
] Another idea is to make :get
|> command(one_of(keys)) do
key -> Redix.command(conn, ["GET", key]) end)
end
|> ... This would allow us to show the code that reproduces the failure - up to some extent, things like This is what I have so far. Feedback is welcome! |
Preconditions vs State.Pre-conditions are needed because during shrinking commands will be removed from the list of commands that caused a failure. Removing these commands means that we may create a command sequence that is "invalid". Pre-conditions ensure that we don't attempt to run "invalid" commands during the shrinking process. Symbolic callsI've been reading through the QuickCheck papers, "QuickCheck Testing for fun and profit" and "Testing Telecoms Software with Quviq QuickCheck" in order to better understand the benefits of using symbolic calls. Here's a quote from "QuickCheck Testing for fun and profit" that I think sums it up nicely:
I think it would be ideal to use symbolic calls under the hood without having to expose them to the end user. I like the idea of co-locating all of the conditions for a call around a data structure like in your example @josevalim. IMO having to pass a generator into the command is awkward but I guess there's no way around that since we're using streams under the hood. Ideally we could just do something like this: def get(key) do
:get
|> command()
|> run(fn state -> Redix.command(state.conn, ["GET", key]) end)
|> precondition(fn state -> state.conn != :empty end)
|> next_state(fn state -> state end)
|> postcondition(fn state, result -> result == Map.fetch(state.contents, key) end)
end
def set(key, value) do
:set
|> command()
|> run(fn state -> Redix.command(state.conn, ["SET", key, value]) end)
|> precondition(fn state -> state.conn != :empty end)
|> next_state(fn state -> put_in(state, [:contents, key], value) end)
|> postcondition(fn _state, result -> result == {:ok, "OK"} end)
end I don't have a strong preference between using something like |
@keathley re: preconditions vs state, thanks for the clarification.
Great job digging it up.
Interesting. I actually like it a lot because I find the symbolic call API very confusing. It is a mixture of special tuples (such as If I understand your proposal correct, you moved the whole definition of the command after the generator is called. If that's the case, I would move the def command(%{conn: :empty}) do
command(__MODULE__, :started_redis, [])
end
def command(%{conn: conn, contents: contents}) do
keys = Map.keys(contents)
# The args in command is always a list of generators
one_of([
command(__MODULE__, :get, [one_of(keys)]),
command(__MODULE__, :set, [binary(), binary()]),
command(__MODULE__, :set, [one_of(keys), binary()])
])
end
def get(key) do
fn state -> Redix.command(state.conn, ["GET", key]) end
|> precondition(fn state -> state.conn != :empty end)
|> postcondition(fn state, result -> result == Map.fetch(state.contents, key) end)
end
def set(key, value) do
fn state -> Redix.command(state.conn, ["SET", key, value]) end
|> precondition(fn state -> state.conn != :empty end)
|> next_state(fn state -> put_in(state, [:contents, key], value) end)
|> postcondition(fn _state, result -> result == {:ok, "OK"} end)
end Does this make sense? A command can be a simple anonymous function or something that we build after piping through precondition/next_state/post_condition. I am still not sure if something needs to be done about symbolic variables. |
I like the idea of having However I am not sure about having commands be a fun that is enriched with precondition, next state, postcondition. Do we always need all three? If we do, maybe we should think about a way of ensuring that and making it easier to do and understand. I also wanted to mention that I am still not sure about this approach of having precondition and postcondition for each command is not more repetitive than having precondition/postcondition hooks like the original proposal. I guess the original proposal resembles a fsm somehow, where you have a function that decides with what precision to handle commands and states. Example from the initial proposal: we might not want to ever issue any command except "connect" against a disconnected Redis instance. I can't form a preference between the two approaches yet though. |
Just to clarify: precondition/postcondition should only be declared if necessary. One question worth asking is if the def command(%{conn: :empty}) do
command(__MODULE__, :started_redis, [])
end
def command(%{conn: conn, contents: contents}) do
keys = Map.keys(contents)
# The args in command is always a list of generators
one_of([
command(__MODULE__, :get, [one_of(keys)]),
command(__MODULE__, :set, [binary(), binary()]),
command(__MODULE__, :set, [one_of(keys), binary()])
])
end
def get(%{conn: :empty}, _key), do: :precondition_failed
def get(state, key) do
result = Redix.command(state.conn, ["GET", key])
assert Map.fetch(state.contents, key) == result
{:ok, state}
end
def set(%{conn: :empty}, _key, _value), do: :precondition_failed
def set(state, key, value) do
assert {:ok, "OK"} = Redix.command(state.conn, ["SET", key, value])
{:ok, put_in(state.contents[key], value)}
end |
What I like in @josevalim's last example is the combination of executing the command, checking the postcondition via But this does not work. During the symbolic execution (ie. command sequence generation), the In PropEr/PropCheck all these callbacks are general functions wich distinguish via the arguments which command is currently under consideration. This is the same pattern in def get_args(_state), do: fixed_list([constant(state.conn), keys()])
def get_pre(%{conn: :empty}), do: false
def get_pre(_state), do: true
def get_next(state, _args, _result), do: state
def get_post(state, [_conn, key], result) do
assert Map.fetch(state.contents, key) == result
end
def get(conn, key) do
Redix.command(conn, ["GET", key])
end
# generate the arguments for the command
def set_args(state), do: fixed_list([constant(state.conn), keys(), values()])
def set_pre(%{conn: :empty}), do: false
def set_pre(_state), do: true
def set_post(_state, _args, result), do: assert result == {:ok, "OK"}
def set_next(state, [_conn, key, val], _result) do
# post condition is true, result may be symbolic
put_in(state.contents[key], value)
end
def set(conn, key, val) do
Redix.command(conn, ["SET", key, val])
end What I like in this approach is that the code for one command stays together. This makes it easier to understand the model piece by piece. Using only regular functions reduces the magic for test code writers. But it feels a little bit off from an Elixirish point of view. How about applying nested definitions as in ExUnit's specify :get do
def args(_state), do: fixed_list([constant(state.conn), keys()])
def pre_condition(%{conn: :empty}), do: false
def pre_condition(_state), do: true
def next_state(state, _args, _result), do: state
def post_condition(state, [_conn, key], result) do
assert Map.fetch(state.contents, key) == result
end
def execute(conn, key) do
Redix.command(conn, ["GET", key])
end
end The This way, we can add more optional features in to the command specification later on. Weights for commands are one example, so commands are not generated by a |
Thanks @alfert for the input!
I returned a tuple because the command may also return
Apologies but I don't see how this is not possible with the proposal so far. In the proposal, the goal was to embed the working command call and next state in a single step. The state is updated only if the command succeeds. Given the graph you referenced, the only thing that is not possible to distinguish is an error in the command itself vs an invalid postcondition. However, it is unclear if any of the libraries use this information in any meaningful way. If they do, we could also allow the command to return
On IRC, we have mentioned that the best approach would be to avoid DSLs and rely mostly on data and functions/callbacks. :) In any case, it is interesting to see that EQC also went to the direction where the knowledge is grouped by the command (get/set) and not by the state machine (pre/post/next_state). |
The problem I see is that the But you are right that my comment is a little too harsh. Melting command execution and checking the post condition together via Just to give an example where this matters: In an extension of the Redis example we could implement a flush command, which drops the entire database. This means that model state must also be empty thereafter. So any call to
Good to hear that! Seems that I should to be at least a lurker at IRC (where I never was before...) to be up to date :) |
From my understanding - and please keep in mind that I understand very little about stateful models - this implies we first generate the commands and apply them only later. Couldn't we apply the commands as we generate them instead? Or would we miss some important feature? Maybe doing so would complicate space exploration or shrinking? In any case, I think this shows there is a dichotomy between the system under test and the model that is not well-defined in the latest proposals. |
Btw, I don't think IRC is required. It is probably a good idea to document here the discussions done there anyway. |
The reason for splitting the generation and execution phase is enabling shrinking and understanding what is going on. The idea is that we generate a list of calls of the system under test. This is one test case for the property. After the generation, we execute this list. If an error happens, then this particular list is shrunk towards a smaller list. Generation phase, shrinking and execution phase all depend on the model. Then the next list of calls is generated and executed and so on. If an error happens, we can see the history of (symbolic) calls to the system under test and how in parallel the model evolves. This helps to understand the error. To get a nice reporting with StreamData is open issue, in particular if the command list is rather long. The very basic idea of the entire approach is that the model reveals relevant features of the system under test. You can have several models for the same system for analysing different aspects, e.g. making the model gradually more complex. The idea behind it is the same as a bisimulation. If the system under test behaves as the model predicts, then the system under test works fine. |
Thank you, @alfert! Do you think checking for preconditions is a property of the model or of the SUT? |
Isn't only the postcondition check a property of the SUT? The point of keeping the model is to not have to check anything from the system except for postconditions afaiu? |
Apologies but I am getting a bit confused here. In both the Proper/EQC example, the |
Checking the precondition is required during the generation phase to generate useful calls. Think about a system that allows to delete items. The precondition would check that such an item does exist within the system. I.e. the model must represent that the system has stored this item and therefore the generated call is legal/sensible/useful. It should trivially hold during execution time and crucial for shrinking because it may happen that shrinking modifies the system and the model state in way that the formerly legal call is no longer legal.
Yes, exactly. The postcondition checks wether the system is in an proper state after executing the call. I.e. the return value of the call must be expected and derivable from the model. Here we can use
Now I am confused. What do you mean by execute in order? Perhaps this helps: Since the state is not stored during generation (this would not be useful, because results of symbolic calls could be part of the state but they do not have a meaning), the state must be replayed during execution time and even more important during shrinking since shrinking modifies the initially generated list of commands. |
What I meant to ask is this: because the I plan to take a look at the state machine implementation in proper to help understand those questions. |
Ah, I see your point. It is different: during generation you simulate the calls to the system, but you use next_state to record the expected state. In next_state you cannot evaluate the result of the symbolic calls. You can only rely on the arguments, the current model state and the expected result from the call (executed later in the future). That is the reason why only the post condition evaluates the result which is only called in the execution phase.
For Proper, there is also the tutorial with the movie server explaining how to specify a model with a complex state (i.e. including deletion):
http://proper.softlab.ntua.gr/Tutorials/PropEr_testing_of_generic_servers.html This might help to understand the approach.
Von unterwegs gesendet
… Am 06.04.2018 um 08:47 schrieb José Valim ***@***.***>:
Now I am confused. What do you mean by execute in order?
What I meant to ask is this: because the next_state function receives the execution result, it means the initial generation of commands needs to happen side by side with the execution, because otherwise we don't have the result to compute the next_state. I guess for shrinking we could use the results from execution but it is also expected that once we change the order or remove commands during shrinking, the results will change, and by then the post condition may be necessary to check the results are still valid.
I plan to take a look at the state machine implementation in proper to help understand those questions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I want to address some of the concerns / points about repetition in the model code. In my experience models involve a non-trivial amount of code. As a reference point a basic model for testing our raft implementation is ~300 loc. Based only on experience and intuition, grouping preconditions, postconditions, and next state generations with each "command" won't help reduce repetition. You almost always need a precondition for a command. The same is true for postconditions. Defining a single callback like: def precondition(%{conn: :empty}, {:call, Redix, _, _}), do: false Is going to be less lines of code then needing to specify the same precondition for each command like: def connected?(%{conn: :empty}), do: false
def connected?(_), do: true
:get
|> command
|> precondition(&connected?/1)
:set
|> command
|> precondition(&connected?/1) The benefit to grouping isn't that there's less lines of code. Its that the meaningful pieces of code are located together. The co-location would potentially make the model and all of the preconditions and postconditions easier to reason about once the model has grown to a reasonable size. |
During the generation phase, PropEr and EQC only pass in a symbolic value to This means, for example, that if you call The common pattern is to create a shim module that will wrap common functions, i.e. start_link(Args) ->
{ok, Pid} = my_server:start_link(Args),
Pid. Then the model can use the opaque value My personal opinion from my current understanding of stateful property testing is that if you are generating command sequences with symbolic call (so they can be recorded to disk, manipulated as data, and so on), you end up requiring some user-provided glue to make the symbolic execution work with the real one, and pretending that only one execution takes place is not going to work. If you decide to generate things differently so that generation and execution happen at the same time, that can possibly work, but shrinking may be more difficult because modifications to the existing chain will not be deterministic. Take for example a sequence where I generate a command based on a random value provided by the server under test. If the value is in the interval Under PropEr and quickcheck, this cannot be done since the symbolic execution would not let me use the server-returned value to influence command generation. Under a mechanism where server-returned values can be used to influence command generation, then it's free game and there's no way to know what I'll get back. You have to be kind of careful about how you approach determinism there. |
I just finished a first version of an EQC like grouping with lots of debug output in Repo https://github.com/alfert/counter:
Some remarks on the current state of the implementation:
I am interested in your feedback. Have a nice weekend! |
Since its been a bit I thought I would provide an update from my side. Currently we're just using proper for all of our stateful testing. Its working well for us so I'm not sure that we need this functionality to exist in stream_data. I'm happy to weigh in with opinions if that is helpful but otherwise I'm not going to be pursuing this further. Thanks for all the input and effort ❤️. |
@alfert I have exported your implementation as separate library https://github.com/hauleth/stream_state. I hope that you do not mind. I also have done some improvements:
TBD:
|
I'm opening this issue based on discussions in IRC to start working towards a proposal for stateful model checking powered by stream_data.
Existing solutions
I'll try to provide an overview of the current solutions. I'm going to use Proper since thats what I'm most familiar with. Proper and EQC have almost identical apis but EQC provides extra features. Proper is a good baseline though. I'm by no means an expert on proper so I may get some of these details wrong. I'm sure @fishcakez can correct me though ;).
Lets say that we wanted to test getting and setting keys in redis. To do that we can use a state machine test. This involves setting up a test and then implementing some callbacks. I've added documentation to each callback to try to provide context on how it works. If you want to run the test locally the full example is here: https://github.com/keathley/redix_model.
The way that proper works is by first generating a list of symbolic calls based on the state space specified by the
command
andnext_state
callbacks. After proper has generated the symbolic calls it executes each command and checks postconditions. If it finds an error it shrinks the generated commands list. The most effective means of doing this is by removing commands. This is one of the reasons we need our preconditions. Its likely that by removing certain commands we'll end up with a command that is invalid based on our current state. We use preconditions to avoid that.Separating the generation and execution of commands means that shrinking is easier to control and somewhat more deterministic from what I understand.
A new api?
I've only done model checking using the "EQC" api so I'm not sure that I have a novel way of approaching the problem from an api standpoint. ScalaCheck uses similar semantics but each command is a class and provides its own preconditions, postconditions, and next state functions. That division seems like it would help keep things tidy as a model grows. I think it would be ideal to use simple functions but its been my experience that models can get quite large so it would be nice to have a more prescribed way to manage them.
I hope that this helps provide some context. I'm very excited to see what y'all think because its been my experience that model checking is where property testing really shines.
The text was updated successfully, but these errors were encountered: