add `DiscreteTimeSum` subclass of `pybamm.Symbol` #4485

martinjrobins · 2024-10-03T10:56:53Z

Description

This would add a new unary operator in a pybamm expression tree that would represent a discrete sum over time. This would be a discrete version of the already existing pybamm.ExplicitTimeIntegral class.

Motivation

This would give model developers and users the ability to add "sum of squares" type variables to a pybamm model that would calculate the difference between, for example, a solution variable and a user-provided dataset. This would be useful for implementing parameter inference using pybamm

Possible Implementation

Similar to the ExplicitTimeIntegral, this would only be evaluated in the Solution class (see _update_variable function). I would propose that the sum is done over the list of time points in the solution (rather than have a separate list of points in the expression tree node), that way a user can specify the time-points via t_interp.

I'm a bit unsure how to allow users to provide their data to the expression. Perhaps they would wrap it in an Interpolant:

data = pybamm.Interpolant(data_t, data_y, pybamm.t)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)

Additional context

see pybop-team/PyBOP#513

The text was updated successfully, but these errors were encountered:

MarcBerliner · 2024-10-03T20:49:57Z

I'm a bit unsure how to allow users to provide their data to the expression. Perhaps they would wrap it in an Interpolant:

@martinjrobins I think we should be able to extract all the information we need from the Interpolant class, but it could confuse a user as to why they need to create an interpolant from their raw data. Maybe we can make a class derived from Interpolant called DiscreteTimeData or something to make this more explicit,

class DiscreteTimeData(pybamm.Interpolant):
    def __init__(self, t, y, children=pybamm.t, ...):
        super().__init__(t, y, children, ...)

and we can keep essentially the same API,

data = pybamm.DiscreteTimeData(data_t, data_y)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)

On another topic, one possible edge case with this DiscreteTimeSum class is that the user passes in multiple objectives inside the same variable, like,

pybamm.DiscreteTimeSum((model.variable["Voltage"] - data_V)**2 + (model.variable["Temperature"] - data_T)**2)

If data_V and data_T have different t values, then I don't think we could appropriately separate these two objectives. We should make sure only one interpolant is allowed inside here.

martinjrobins · 2024-10-04T08:16:52Z

Yea, agree that the DiscreteTimeData subclass is the way to go. How do we want to handle inconsistancies between the time points in the solution, and the time points in the data provided? Ideally the user will make sure they match, for example:

# ... model setup
data = pybamm.DiscreteTimeData(data_t, data_y)
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage"] - data)**2)
# .... solver setup
sol = solver.solve(t_eval=[data_t[0], data_t[-1]], t_interp=data_t)
print("sum of squares is:", sol["data_comparison"])

My first thought is that we should raise an error on the last line if the timepoints in sol don't match data_t. We could alternativly interpolate the solution onto the data points, this might make sense (i.e. not introduce to much additional error) if we use the new hermite interpolation

BradyPlanden · 2024-10-04T10:34:51Z

this might make sense (i.e. not introduce to much additional error) if we use the new hermite interpolation

I think this is the correct way forward, with the solve time taken from the last t value in the data. This would also allow different t values for the multiple fitting interpolants in the future, one would just need to take the largest final value from data_t for the solve and interpolate the points on the corresponding variable.

We should make sure only one interpolant is allowed inside here.

I'm not sure we need this constraint, given the above. I think either ensuring that data is constructed with matching numpy lengths or interpolating them onto the right variable will allow for multiple interpolants, do you see any issues with that? In the below,data is a dictionary of numpy arrays with keys that match the fitting variables. In PyBOP, the data object (pybop.Dataset) contains both data_t and data_y to achieve this. This could be implemented within pybamm.DiscreteTimeData; however, it might be blurring the lines between PyBaMM and PyBOP a bit more than needed.

# ... model setup
dataset = pybamm.DiscreteTimeData(data) # Accepts a dictionary of fitting variables and returns a dict of interpolants for each
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage [V]"] - dataset["Voltage [V]")**2)
# .... solver setup
sol = solver.solve(t_eval=[data_t[0], data_t[-1]], t_interp=data_t)
print("sum of squares is:", sol["data_comparison"])

martinjrobins · 2024-10-04T14:06:54Z

after chatting with @BradyPlanden, we came to the following conclusion:

We will interpolate from the solution to the datapoints given (this will handle the case where different output variables could have different data timepoints)
pybamm.DiscreteTimeData will take a single dataset, if you want multiple datasets you could do this like so:

datasets = { name: pybamm.DiscreteTimeData(data) for name, data in pybop_dataset.items() }
model.variable["data_comparison"] = pybamm.DiscreteTimeSum((model.variable["Voltage [V]"] - datasets["Voltage [V]")**2)

MarcBerliner · 2024-10-04T14:07:35Z

I understand the perspective of optimizing an objective function, but I think we shouldn't be too fancy with this. At a minimum, the only specialization we need for a DiscreteTimeSum vs. a traditional time-series observable is to sum over all the points in time before returning the value. On Brady's point, I think anything more on the PyBaMM side is blurring the lines between PyBaMM as a simulation tool vs. an optimization tool. In my mind, a tool like PyBOP is the appropriate place to implement these optimization-specific niceties.

How do we want to handle inconsistancies between the time points in the solution, and the time points in the data provided?

data_t is not necessarily equal to sol.t because the simulation will stop at all the t_interp + t_eval points which do not always overlap with data_t. We can just interpolate the DiscreteTimeSum observable onto the data_t points and then sum them up.

I'm not sure we need this constraint, given the above. I think either ensuring that data is constructed with matching numpy lengths or interpolating them onto the right variable will allow for multiple interpolants, do you see any issues with that?

Yeah we can do that. This is not an issue with the dataframe example, but for general interpolants, we just need to make sure that they have identical data_t values within one objective function.

MarcBerliner · 2024-10-04T14:16:55Z

1. We will interpolate from the solution to the datapoints given (this will handle the case where different output variables could have different data timepoints)

2. `pybamm.DiscreteTimeData` will take a single dataset, if you want multiple datasets you could do this like so:

@martinjrobins sounds good to me. Since we use a single dataset, we can even automatically name the discrete sum based on the name of the observable/dataframe column (like "Discrete sum Voltage [V]").

* feat: add discrete time sum expression tree node #4485 * docs: fix math syntax in docstring * remove prints * test casadi solver as well * coverage * coverage * add to changelog and tidy solution test

martinjrobins added the feature label Oct 3, 2024

martinjrobins added a commit that referenced this issue Oct 8, 2024

feat: add discrete time sum expression tree node #4485

0076114

martinjrobins mentioned this issue Oct 8, 2024

feat: add discrete time sum expression tree node #4501

Merged

6 tasks

MarcBerliner closed this as completed in #4501 Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `DiscreteTimeSum` subclass of `pybamm.Symbol` #4485

add `DiscreteTimeSum` subclass of `pybamm.Symbol` #4485

martinjrobins commented Oct 3, 2024

MarcBerliner commented Oct 3, 2024 •

edited

Loading

martinjrobins commented Oct 4, 2024

BradyPlanden commented Oct 4, 2024 •

edited

Loading

martinjrobins commented Oct 4, 2024 •

edited

Loading

MarcBerliner commented Oct 4, 2024

MarcBerliner commented Oct 4, 2024

add DiscreteTimeSum subclass of pybamm.Symbol #4485

add DiscreteTimeSum subclass of pybamm.Symbol #4485

Comments

martinjrobins commented Oct 3, 2024

Description

Motivation

Possible Implementation

Additional context

MarcBerliner commented Oct 3, 2024 • edited Loading

martinjrobins commented Oct 4, 2024

BradyPlanden commented Oct 4, 2024 • edited Loading

martinjrobins commented Oct 4, 2024 • edited Loading

MarcBerliner commented Oct 4, 2024

MarcBerliner commented Oct 4, 2024

add `DiscreteTimeSum` subclass of `pybamm.Symbol` #4485

add `DiscreteTimeSum` subclass of `pybamm.Symbol` #4485

MarcBerliner commented Oct 3, 2024 •

edited

Loading

BradyPlanden commented Oct 4, 2024 •

edited

Loading

martinjrobins commented Oct 4, 2024 •

edited

Loading