Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DiscreteTimeSum subclass of pybamm.Symbol #4485

Closed
martinjrobins opened this issue Oct 3, 2024 · 6 comments · Fixed by #4501
Closed

add DiscreteTimeSum subclass of pybamm.Symbol #4485

martinjrobins opened this issue Oct 3, 2024 · 6 comments · Fixed by #4501
Labels

Comments

@martinjrobins
Copy link
Contributor

Description

This would add a new unary operator in a pybamm expression tree that would represent a discrete sum over time. This would be a discrete version of the already existing pybamm.ExplicitTimeIntegral class.

Motivation

This would give model developers and users the ability to add "sum of squares" type variables to a pybamm model that would calculate the difference between, for example, a solution variable and a user-provided dataset. This would be useful for implementing parameter inference using pybamm

Possible Implementation

Similar to the ExplicitTimeIntegral, this would only be evaluated in the Solution class (see _update_variable function). I would propose that the sum is done over the list of time points in the solution (rather than have a separate list of points in the expression tree node), that way a user can specify the time-points via t_interp.

I'm a bit unsure how to allow users to provide their data to the expression. Perhaps they would wrap it in an Interpolant:

data = pybamm.Interpolant(data_t, data_y, pybamm.t)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)

Additional context

see pybop-team/PyBOP#513

@MarcBerliner
Copy link
Member

MarcBerliner commented Oct 3, 2024

I'm a bit unsure how to allow users to provide their data to the expression. Perhaps they would wrap it in an Interpolant:

@martinjrobins I think we should be able to extract all the information we need from the Interpolant class, but it could confuse a user as to why they need to create an interpolant from their raw data. Maybe we can make a class derived from Interpolant called DiscreteTimeData or something to make this more explicit,

class DiscreteTimeData(pybamm.Interpolant):
    def __init__(self, t, y, children=pybamm.t, ...):
        super().__init__(t, y, children, ...)

and we can keep essentially the same API,

data = pybamm.DiscreteTimeData(data_t, data_y)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)

On another topic, one possible edge case with this DiscreteTimeSum class is that the user passes in multiple objectives inside the same variable, like,

pybamm.DiscreteTimeSum((model.variable["Voltage"] - data_V)**2 + (model.variable["Temperature"] - data_T)**2)

If data_V and data_T have different t values, then I don't think we could appropriately separate these two objectives. We should make sure only one interpolant is allowed inside here.

@martinjrobins
Copy link
Contributor Author

Yea, agree that the DiscreteTimeData subclass is the way to go. How do we want to handle inconsistancies between the time points in the solution, and the time points in the data provided? Ideally the user will make sure they match, for example:

# ... model setup
data = pybamm.DiscreteTimeData(data_t, data_y)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)
# .... solver setup
sol = solver.solve(t_eval=[data_t[0], data_t[-1]], t_interp=data_t)
print("sum of squares is:", sol["data_comparison"])

My first thought is that we should raise an error on the last line if the timepoints in sol don't match data_t. We could alternativly interpolate the solution onto the data points, this might make sense (i.e. not introduce to much additional error) if we use the new hermite interpolation

@BradyPlanden
Copy link
Member

BradyPlanden commented Oct 4, 2024

this might make sense (i.e. not introduce to much additional error) if we use the new hermite interpolation

I think this is the correct way forward, with the solve time taken from the last t value in the data. This would also allow different t values for the multiple fitting interpolants in the future, one would just need to take the largest final value from data_t for the solve and interpolate the points on the corresponding variable.

We should make sure only one interpolant is allowed inside here.

I'm not sure we need this constraint, given the above. I think either ensuring that data is constructed with matching numpy lengths or interpolating them onto the right variable will allow for multiple interpolants, do you see any issues with that? In the below,data is a dictionary of numpy arrays with keys that match the fitting variables. In PyBOP, the data object (pybop.Dataset) contains both data_t and data_y to achieve this. This could be implemented within pybamm.DiscreteTimeData; however, it might be blurring the lines between PyBaMM and PyBOP a bit more than needed.

# ... model setup
dataset = pybamm.DiscreteTimeData(data) # Accepts a dictionary of fitting variables and returns a dict of interpolants for each
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage [V]"] - dataset["Voltage [V]")**2)
# .... solver setup
sol = solver.solve(t_eval=[data_t[0], data_t[-1]], t_interp=data_t)
print("sum of squares is:", sol["data_comparison"])

@martinjrobins
Copy link
Contributor Author

martinjrobins commented Oct 4, 2024

after chatting with @BradyPlanden, we came to the following conclusion:

  1. We will interpolate from the solution to the datapoints given (this will handle the case where different output variables could have different data timepoints)
  2. pybamm.DiscreteTimeData will take a single dataset, if you want multiple datasets you could do this like so:
datasets = { name: pybamm.DiscreteTimeData(data) for name, data in pybop_dataset.items() }
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage [V]"] - datasets["Voltage [V]")**2)

@MarcBerliner
Copy link
Member

I understand the perspective of optimizing an objective function, but I think we shouldn't be too fancy with this. At a minimum, the only specialization we need for a DiscreteTimeSum vs. a traditional time-series observable is to sum over all the points in time before returning the value. On Brady's point, I think anything more on the PyBaMM side is blurring the lines between PyBaMM as a simulation tool vs. an optimization tool. In my mind, a tool like PyBOP is the appropriate place to implement these optimization-specific niceties.

How do we want to handle inconsistancies between the time points in the solution, and the time points in the data provided?

data_t is not necessarily equal to sol.t because the simulation will stop at all the t_interp + t_eval points which do not always overlap with data_t. We can just interpolate the DiscreteTimeSum observable onto the data_t points and then sum them up.

I'm not sure we need this constraint, given the above. I think either ensuring that data is constructed with matching numpy lengths or interpolating them onto the right variable will allow for multiple interpolants, do you see any issues with that?

Yeah we can do that. This is not an issue with the dataframe example, but for general interpolants, we just need to make sure that they have identical data_t values within one objective function.

@MarcBerliner
Copy link
Member

1. We will interpolate from the solution to the datapoints given (this will handle the case where different output variables could have different data timepoints)

2. `pybamm.DiscreteTimeData` will take a single dataset, if you want multiple datasets you could do this like so:

@martinjrobins sounds good to me. Since we use a single dataset, we can even automatically name the discrete sum based on the name of the observable/dataframe column (like "Discrete sum Voltage [V]").

MarcBerliner pushed a commit that referenced this issue Oct 9, 2024
* feat: add discrete time sum expression tree node #4485

* docs: fix math syntax in docstring

* remove prints

* test casadi solver as well

* coverage

* coverage

* add to changelog and tidy solution test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants