Physics based priors #26

peastman · 2021-06-23T20:02:33Z

I want to implement some physics based priors. An example would be a Coulomb interaction based on pre-computed partial charges that are stored in the dataset. BasePrior.forward() is supposed to return a list of per-atom energy contributions, but physics based interactions usually do not decompose in that way. It would be much easier if it could just return a total energy for each sample.

What do you recommend as the cleanest way of implementing this?

The text was updated successfully, but these errors were encountered:

PhilippThoelke · 2021-06-24T08:32:12Z

BasePrior.forward() returns the updated atomwise energy predictions, not only the prior contribution (see the Atomref prior as an example). It would be possible to add the prior divided by the number of atoms to each atomwise prediction but I don't think this is very clean. It might be better to have an atomwise_prior flag in BasePrior which determines whether it should be applied before or after reducing the atomwise predictions. We would then apply the prior model either before (as it is done now) or after the following line, based on this flag: https://github.com/compsciencelab/torchmd-net/blob/7a92b3ef7739da14aa8cdd2d87c8e9462d86cf14/torchmdnet/models/output_modules.py#L73
The flag can then be set through the super().__init__(atomwise_prior=True/False) call of the deriving class.

Feel free to open a PR if you want to have a go at this, otherwise I can also make this change.

peastman · 2021-06-24T20:51:46Z

Sounds good, I'll add that.

peastman · 2022-10-03T20:03:14Z

Sorry for reopening an old issue - this never actually got implemented! I'm about to start on it now. I'll add the atomwise_prior flag as suggested above. I also have a few other questions about the best way of implementing particular potentials.

Physics based potentials will often require extra information. For example, one prior I want to add is the ZBL stopping potential for short range repulsion. It requires knowing the atomic number of every atom. forward() takes an argument z which often corresponds to the atomic number, but not always. For example, in some of my models I define atom types by the combination of element and formal charge, so z is an arbitrary atom type index. I need to separately provide both an atom type and an atomic number.

A more complicated example is Coulomb, which depends on the charge of each atom. That could be a fixed number (formal charge or precomputed partial charge), but in other cases it will itself be calculated as an output of the model.

What's the best way of providing extra information like this that's needed to compute a physics based potential?

peastman · 2022-10-03T20:45:11Z

Another question: currently it only lets you specify a single prior model, but often you may want more than one. A ZBL potential for short range repulsion, plus Coulomb for long range interactions, plus D4 for dispersion. (SpookyNet includes all of those.) Any thoughts on the best way of handling this?

peastman · 2022-10-03T22:47:53Z

And another question: how do we want to handle units? Nothing in the current code worries about them, except sometimes for a particular dataset. The inputs and outputs to a model are just numbers that might have any units. But physical potentials involve hardcoded constants like eps_0 whose values depend on units. How should we handle this? Some possibilities include

Standardizing on a particular set of units for all models.
Standardize on a particular set of units for a particular potential function, and require the user to provide scaling factors for converting positions, energies, etc. to the required units.
Implement it at a higher level, where a model specifies what units it uses.
- We could also make each Dataset specify its units, and conversions between the dataset and model units would happen automatically.

giadefa · 2022-10-04T12:07:58Z

can you use that single model as a wrapper for all the models that you would like to use? It's just the entry point.

…

On Mon, Oct 3, 2022 at 4:45 PM Peter Eastman ***@***.***> wrote: Another question: currently it only lets you specify a single prior model, but often you may want more than one. A ZBL potential for short range repulsion, plus Coulomb for long range interactions, plus D4 for dispersion. (SpookyNet includes all of those.) Any thoughts on the best way of handling this? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOQDFOZQNOJDW332PW3WBNAWFANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

giadefa · 2022-10-04T12:08:53Z

The code is unit-neutral, whatever you use as input it spits out as output

…

On Mon, Oct 3, 2022 at 6:48 PM Peter Eastman ***@***.***> wrote: And another question: how do we want to handle units? Nothing in the current code worries about them, except sometimes for a particular dataset. The inputs and outputs to a model are just numbers that might have any units. But physical potentials involve hardcoded constants like eps_0 whose values depend on units. How should we handle this? Some possibilities include - Standardizing on a particular set of units for all models. - Standardize on a particular set of units for a particular potential function, and require the user to provide scaling factors for converting positions, energies, etc. to the required units. - Implement it at a higher level, where a model specifies what units it uses. - We could also make each Dataset specify its units, and conversions between the dataset and model units would happen automatically. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOSBZZSRRT5ZD6ARC5TWBNPCJANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2022-10-04T15:56:42Z

The code is unit-neutral, whatever you use as input it spits out as output

I know. And that's incompatible with physics based potentials. They involve internal parameters with units. It's impossible to compute them without knowing what units the inputs are in, and what units the outputs are expected to be in.

peastman · 2022-10-04T16:04:53Z

can you use that single model as a wrapper for all the models that you would like to use? It's just the entry point.

Currently there's a single option in the configuration file, prior_model. It specifies the class of the single prior model to create, whose constructor has to take no arguments except the dataset. We need to be able to specify much more complicated arrangements. For example, "Include both a ZBL potential, using atomic numbers specified in the dataset, and a Coulomb potential, using charges that are generated as an output of the model."

giadefa · 2022-10-04T16:05:53Z

who builds the potential knows, so the problem is when we distribute it, maybe we can add some optional unit indication somehow?

…

On Tue, Oct 4, 2022 at 11:56 AM Peter Eastman ***@***.***> wrote: The code is unit-neutral, whatever you use as input it spits out as output I know. And that's incompatible with physics based potentials. They involve internal parameters with units. It's impossible to compute them without knowing what units the inputs are in, and what units the outputs are expected to be in. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOSNCH2E4YKKSGLTXC3WBRHUJANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

giadefa · 2022-10-04T16:10:14Z

We could extend that as we did for the dataset arguments with a prior_args dictionary

…

On Tue, Oct 4, 2022 at 12:05 PM Peter Eastman ***@***.***> wrote: can you use that single model as a wrapper for all the models that you would like to use? It's just the entry point. Currently there's a single option in the configuration file, prior_model. It specifies the class of the single prior model to create, whose constructor has to take no arguments except the dataset. We need to be able to specify much more complicated arrangements. For example, "Include both a ZBL potential, using atomic numbers specified in the dataset, and a Coulomb potential, using charges that are generated as an output of the model." — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUORJGKTK5PHL2OVOPPLWBRIS7ANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2022-10-04T16:15:13Z

Perhaps we're thinking about this differently. Currently the file priors.py includes only a single prior model: Atomref, which subtracts a reference energy that is defined in the dataset for each atom type. My understanding is that we want to add more choices that implement common physical models. Someone should be able to specify prior_model: Coulomb in their configuration file, and it will add a Coulomb interaction to whatever model they're trying to train. Is that different from your understanding?

giadefa · 2022-10-04T16:22:49Z

yes, that is fine. Maybe I don't understand the problem, you can specify whatever class. Maybe just do what you need on a PR so we can discuss it. Especially if it's backward compatible, we can merge it. g

…

On Tue, Oct 4, 2022 at 12:15 PM Peter Eastman ***@***.***> wrote: Perhaps we're thinking about this differently. Currently the file priors.py includes only a single prior model: Atomref, which subtracts a reference energy that is defined in the dataset for each atom type. My understanding is that we want to add more choices that implement common physical models. Someone should be able to specify prior_model: Coulomb in their configuration file, and it will add a Coulomb interaction to whatever model they're trying to train. Is that different from your understanding? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOWZ4NQG2VVUKFAVSTTWBRJZXANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2022-10-04T17:08:07Z

Let's consider the problem of adding a Coulomb interaction. To begin with, it can be implemented in several ways.

The dataset provides a precomputed charge for every atom. You compute the interaction based on them.
The model predicts a charge for every atom. They get used to compute the interaction.
The model predicts an electronegativity for every atom. You use them to solve for the charges, which then get used to compute the interaction. This also requires knowing the total charge on every molecule, or even better the formal charge on every atom.

First problem is that I don't think TorchMD-Net really supports multitask models yet? Its models produce a single number for each atom, which is interpreted as energy. We need them to produce multiple values for each atom: both energy and charge, or energy and electronegativity. That might also involve extra terms in the loss function: train the model to reproduce charges specified in the dataset.

Next we need to define a mechanism for datasets to provide the extra information required: partial charges, formal charges, or both.

Then there's the problem of units. The Coulomb code receives positions in some units and it needs to produce an energy in some units. The value of eps_0 depends on the units. It's impossible to calculate the result without knowing the units.

Finally we need to design the user interface for all of this. What options will the user add to their configuration file to request a particular combination of physical terms, calculated in a particular way, trained with a particular loss function? You need to be able to specify things like this: "My dataset reports atomic numbers, formal charges, and partial charges for every atom. I want the model to predict electronegativities, which will be used to compute partial charges. Include a loss term based on how well the predicted charges match the ones in the dataset. Once the charges are computed, add ZBL, Coulomb, and dispersion terms to the final energy."

giadefa · 2022-10-04T17:43:18Z

Would it be better to just delta the forces outside torchmd-net? For example in openMM and just use torchmd-net for the NNP part? Let's have a chat about it. I'll contact you directly

…

On Tue, Oct 4, 2022 at 1:08 PM Peter Eastman ***@***.***> wrote: Let's consider the problem of adding a Coulomb interaction. To begin with, it can be implemented in several ways. - The dataset provides a precomputed charge for every atom. You compute the interaction based on them. - The model predicts a charge for every atom. They get used to compute the interaction. - The model predicts an electronegativity for every atom. You use them to solve for the charges, which then get used to compute the interaction. This also requires knowing the total charge on every molecule, or even better the formal charge on every atom. First problem is that I don't think TorchMD-Net really supports multitask models yet? Its models produce a single number for each atom, which is interpreted as energy. We need them to produce multiple values for each atom: both energy and charge, or energy and electronegativity. That might also involve extra terms in the loss function: train the model to reproduce charges specified in the dataset. Next we need to define a mechanism for datasets to provide the extra information required: partial charges, formal charges, or both. Then there's the problem of units. The Coulomb code receives positions in some units and it needs to produce an energy in some units. The value of eps_0 depends on the units. It's impossible to calculate the result without knowing the units. Finally we need to design the user interface for all of this. What options will the user add to their configuration file to request a particular combination of physical terms, calculated in a particular way, trained with a particular loss function? You need to be able to specify things like this: "My dataset reports atomic numbers, formal charges, and partial charges for every atom. I want the model to predict electronegativities, which will be used to compute partial charges. Include a loss term based on how well the predicted charges match the ones in the dataset. Once the charges are computed, add ZBL, Coulomb, and dispersion terms to the final energy." — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUORU5VPJAKUYACSSO6DWBRQAFANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

PhilippThoelke · 2022-10-04T17:48:20Z

Multi-head implementation

Yes, currently models can only have a single output head. I think adding multi-head support could be useful, however, requires changes to the current training interface and poses some important design questions.

For example:
With multiple output heads it should be possible to individually set the standardize argument, the type of output model (i.e. output_model argument), every output head should get its own list of prior models (and in turn list of prior_args), its own unit, a loss weighting factor, a "compute neg. derivatives" flag, exponential moving average coefficients for y and potentially neg_dy and potentially a couple others I'm forgetting now.

If then you want charges to come either from the model or from the dataset it gets even more complicated, indicating which one should be used. You would also need to order the computation of the output heads as you are suggesting that the output of one head could depend on the prediction of another. This relationship would also have to be defined in the config file.

Prior args

We could extend that as we did for the dataset arguments with a prior_args
dictionary

This already exists and currently is a critical piece in order to reconstruct prior models from model checkpoints. If the model contains a prior model, the arguments required for building this model will be stored in the hparams.yaml file as they are set here

torchmd-net/torchmdnet/scripts/train.py

Line 129 in a80e378

args.prior_args = prior.get_init_args()

The load_model function expects prior_args to be set. The prior model itself expects either the dataset object or the prior args to be present to correctly instantiate the object. For example the Atomref prior depends on max_z, which is stored in prior_args but falls back to retrieving this from the dataset object in case max_z is not set:

torchmd-net/torchmdnet/priors.py

Lines 50 to 57 in a80e378

    
           def __init__(self, max_z=None, dataset=None): 
        
               super(Atomref, self).__init__() 
        
               if max_z is None and dataset is None: 
        
                   raise ValueError("Can't instantiate Atomref prior, all arguments are None.") 
        
               if dataset is None: 
        
                   atomref = torch.zeros(max_z, 1) 
        
               else: 
        
                   atomref = dataset.get_atomref()

This is the section of code that reconstructs a prior model from a checkpoint file but that is ignored if it's the first time constructing this model:

torchmd-net/torchmdnet/models/model.py

Lines 67 to 77 in a80e378

    
           if args["prior_model"] and prior_model is None: 
        
               assert "prior_args" in args, ( 
        
                   f"Requested prior model {args['prior_model']} but the " 
        
                   f'arguments are lacking the key "prior_args".' 
        
               ) 
        
               assert hasattr(priors, args["prior_model"]), ( 
        
                   f'Unknown prior model {args["prior_model"]}. ' 
        
                   f'Available models are {", ".join(priors.__all__)}' 
        
               ) 
        
               # instantiate prior model if it was not passed to create_model (i.e. when loading a model) 
        
               prior_model = getattr(priors, args["prior_model"])(**args["prior_args"])

In the first instantiated of a model, the prior_model variable will not be None in this context as it is directly passed to the create_model function:

torchmd-net/torchmdnet/module.py

Line 24 in a80e378

self.model = create_model(self.hparams, prior_model, mean, std)

Defining units

I think the freedom the current unit handling (i.e. none) gives is very powerful. If you want to predict some property you don't have to rely on us supporting units for this property but instead just plug in a dataset with the appropriate label. For self- or unsupervised pretraining we also don't necessarily have a unit that we predict, if for example we mask the atom type of one atom and predict this. To pass unit information to prior models I think it does make sense that datasets have some standardized way of defining their units (if it makes sense to define a unit for a certain dataset). Similarly to how Atomref currently works, it relies on the dataset implementing the get_atomref function. The way that the Atomref prior accesses this function here:

torchmd-net/torchmdnet/priors.py

Line 57 in a80e378

atomref = dataset.get_atomref()

you could in the same way access some kind of get_units function and throw an error if a given dataset doesn't implement that.

PhilippThoelke · 2022-10-04T17:54:02Z

And yes, through the config file and model creation/loading you are currently limited to just one prior model, however, technically it would be possible to wrap the model in multiple prior models by just nesting them.

peastman · 2022-10-04T18:08:05Z

Let's have a chat about it. I'll contact you directly

That would be great if all three of us can meet to discuss it.

You would also need to order the computation of the output heads as you are suggesting that the output of one head could depend on the prediction of another.

I think that's more general than what we need. For Coulomb the two outputs (per-atom energy and per-atom charge) can be predicted independently. Each one gets fed into a calculation that produces a per-sample energy, and the two get added together. They're only linked through the loss function.

peastman · 2022-10-05T23:32:54Z

The prior model itself expects either the dataset object or the prior args to be present to correctly instantiate the object.

In this case it's not strictly either/or. Some attributes will be retrieved from the dataset (for example, the atomic numbers for atom types). Others always need to be specified in the parameter file (for example, the cutoff distance). The existing code assumes everything can be retrieved from the dataset:

torchmd-net/torchmdnet/scripts/train.py

Line 128 in c1c4fcf

prior = getattr(priors, args.prior_model)(dataset=data.dataset)

Any suggestions on the best way to handle this, so we can pass configuration options to the prior even when first creating it?

PhilippThoelke · 2022-10-05T23:41:40Z

You could simply pass args to the prior model in the same step as passing the dataset. If needed you could add an initial_prior_args of some sorts that can maybe take a dictionary (similar to how the dataset_arg currently works) with the required parameters for the prior model you want to create.

peastman · 2022-11-01T21:39:25Z

With ZBL implemented in #134 and D2 in progress in #143, the next question is how to combine multiple priors. What if I want a model to include both ZBL and D2?

The internal changes can be very simple. We can just make TorchMD_Net.prior_model into a list. pre_reduce() and post_reduce() are each called on all the priors in order. So mostly this is a question of how the user specifies the prior models in the config file.

One option is to make prior_model and prior_args into lists:

prior_model:
  - ZBL
  - D2
prior_args:
  - {"cutoff_distance": 4.0, "max_num_neighbors": 50}
  - {"cutoff_distance": 10.0, "max_num_neighbors": 100}

A possibly cleaner option is to combine them:

prior_model:
  - ZBL:
      cutoff_distance: 4.0
      max_num_neighbors: 50
  - D2:
      cutoff_distance: 10.0
      max_num_neighbors: 100

PhilippThoelke · 2022-11-01T22:31:26Z

Sounds good! I think version two is way more readable, which I think is what we should be going for in the config files. I reckon both versions will be hard to implement in the CLI?

peastman · 2022-11-01T23:03:30Z

We would have to come up with a syntax for specifying it on the command line and write our own code for parsing it.

giadefa · 2022-11-02T00:51:22Z

we could also drop the CLI and assume we initialize via a yaml file.

…

On Wed, Nov 2, 2022 at 12:03 AM Peter Eastman ***@***.***> wrote: We would have to come up with a syntax for specifying it on the command line and write our own code for parsing it. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOXYE72YFFQ5B7SEKSTWGGOUZANCNFSM47GPOV2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2022-11-02T03:37:23Z

Being able to override settings from the command line is still useful, at least for some settings. For example, to train several copies of the same model with different random number seeds. But the prior model is probably one of the less important ones to override from the command line.

raimis · 2022-11-02T10:52:05Z

I'm in favour for the second version.

We are already using more structure inputs for the data loaders. Regarding the CLI, we maintain the ability to override simple options like seed.

peastman · 2022-12-01T23:39:02Z

I'm starting work on a Coulomb prior. For the moment I'm just trying to implement fixed, precomputed partial charges. Later I'll get to charges that are dynamically predicted by the model.

This requires some way of passing in charges. Currently the prior gets invoked as

def post_reduce(self, y, z, pos, batch):

For the current ones that's sufficient because z provides everything you need to know about each atom. For Coulomb it isn't, since two atoms with the same type/element may have different partial charges. We need a way for the dataset to provide charges, and for them to get passed to the model. We probably want to implement this in a generic way: other models/datasets might involve other fields in the future. For example, we could allow forward() to take arbitrary keyword arguments, which are automatically filled in with whatever values the dataset returned.

Any suggestions on the best way of structuring it?

peastman · 2024-02-13T17:57:35Z

All of them. They need to incorporate the extra arguments into the calculation.

giadefa · 2024-02-13T18:02:41Z

Just do what you need, test it, and break it in the branch. We do normally like this, we have a prototype branch and when we know that it's valuable, we rewrite it to incorporate it. g

…

On Tue, Feb 13, 2024 at 6:57 PM Peter Eastman ***@***.***> wrote: All of them. They need to incorporate the extra arguments into the calculation. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOS7XAG237R3IQ43QILYTOSRVAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTANZSHE4Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-13T18:05:50Z

I don't know what the best way of incorporating it into the model is. That's why I asked the question:

What would be the best way of incorporating that extra information into the calculation?

I'm hoping someone with experience in the internals of the models will provide useful suggestions.

guillemsimeon · 2024-02-13T18:06:08Z

Not sure if I got your case correctly Peter, but TensorNet currently supports total charge because we decided to default to this behavior. However, if you look into the model, the total charge is assigned to every node and used in every node’s interaction product. We indeed did experiments by setting precomputed partial charges to each atom instead of the total charge. This just would require modifying the line in the embedding part where we do q[batch]. Our experiments worked very well, but we needed to use fix partial charges (that is, Gasteiger for example). If partial charges were changing across conformations instabilities arised, but you could revisit that. Again, I don’t know if this is exactly your issue with charges, but in case it is, a very small fix to TensorNet would allow you to do this.

…

On Tue, 13 Feb 2024 at 18:57, Peter Eastman ***@***.***> wrote: All of them. They need to incorporate the extra arguments into the calculation. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA4FSVUCOVABLIJKQRLYTOSRXAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTANZSHE4Q> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

guillemsimeon · 2024-02-13T18:07:28Z

Also, you can use the same approach with molecular spins, instead of charges. It is just a trick in the model to make conformations non-degenerate when there are other electronic degrees of freedom.

…

On Tue, 13 Feb 2024 at 19:06, Peter Eastman ***@***.***> wrote: I don't know what the best way of incorporating it into the model is. That's why I asked the question: What would be the best way of incorporating that extra information into the calculation? I'm hoping someone with experience in the internals of the models will provide useful suggestions. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA375E4KN52C2XHU4JDYTOTQXAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTCOJSGYZQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

peastman · 2024-02-13T21:44:00Z

So all I need to do is change

torchmd-net/torchmdnet/models/tensornet.py

Lines 240 to 243 in 6d8e315

    
           if q is None: 
        
               q = torch.zeros_like(z, device=z.device, dtype=z.dtype) 
        
           else: 
        
               q = q[batch]

to

 if q is None: 
     q = torch.zeros_like(z, device=z.device, dtype=z.dtype) 
 elif q.shape != z.shape: 
     q = q[batch]

Then it should work correctly with any of the following cases?

q is None
q has one value per sample
q had one value per atom

guillemsimeon · 2024-02-13T21:54:22Z

That’s it! Notice that in front of qs in the interaction layers (both in the interaction product and the squared residual update, the latter one I forgot to mention before) there is a factor of 0.1. This is arbitrary, but it worked fine in my experiments. However, notice that in the case partial charge (or let’s say ‘hypercharge’, since it can be whatever) is -10, you get 0 and cancel the interaction. Nevertheless, I don’t know if partial or total charges of -10 are realistic in our setting (I assume not). What I want to say is that from that point on, once you have your property stored in q, you can do whatever with it. For example, a multilayer perceptron with input size 1 and output size hidden channels, and you element-wise multiply the interaction products and the residual update. This is just one arbitrary case. For anything else you want to explore or try which is not customary, I can help you with it.

…

On Tue, 13 Feb 2024 at 22:44, Peter Eastman ***@***.***> wrote: So all I need to do is change https://github.com/torchmd/torchmd-net/blob/6d8e3159cfb8bb971ecf7a2abd589735d79a7e53/torchmdnet/models/tensornet.py#L240-L243 to if q is None: q = torch.zeros_like(z, device=z.device, dtype=z.dtype) elif q.shape != z.shape: q = q[batch] Then it should work correctly with any of the following cases? - q is None - q has one value per sample - q had one value per atom — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA52HCXNHIGEY7KLIJLYTPNCZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGI3DOMBUGM3A> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2024-02-14T04:48:12Z

For partial charges that should be a safe assumption. They're usually between -1 and 1. They can be a bit larger for some ions, but they shouldn't be anywhere close to 10. Total charge could reach 10 in some cases.

I notice that q is only passed to the interaction layers, not to the embedding, meaning if you set num_layers to 0 the charge is ignored. Should it also be passed to the embedding?

guillemsimeon · 2024-02-14T16:06:52Z

I am afraid you cannot deal with charges (not in this way) for num layers equal to 0. It would require thinking of another way of doing it. Perhaps it would be worth adding a warning when q is not None and num layers is 0.

…

On Wed, 14 Feb 2024 at 05:48, Peter Eastman ***@***.***> wrote: For partial charges that should be a safe assumption. They're usually between -1 and 1. They can be a bit larger for some ions, but they shouldn't be anywhere close to 10. Total charge could reach 10 in some cases. I notice that q is only passed to the interaction layers, not to the embedding, meaning if you set num_layers to 0 the charge is ignored. Should it also be passed to the embedding? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA4YUYESBAHJEV7L5RDYTQ6ZRAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGMYDONRZGA4Q> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2024-02-14T21:48:50Z

Suppose we want to add in multiple values, for example both charge and spin. Can this method be generalized to handle that case? Or do we need to figure out a different way of incorporating them into the calculation?

guillemsimeon · 2024-02-14T22:10:29Z

We will be currently looking into that. From what I understand, for spin you either have singlet or triplet, which would be a binary variable? In any case, I see nothing preventing us from creating a rescaling coefficient for the interaction products and the residual update which is a suitable linear combination (perhaps even with learnable coefficients, though it might not be necessary) of a spin and a charge term, as the most simple case. If you explore something in this direction, let us know, please.

…

On Wed, 14 Feb 2024 at 22:49, Peter Eastman ***@***.***> wrote: Suppose we want to add in multiple values, for example both charge and spin. Can this method be generalized to handle that case? Or do we need to figure out a different way of incorporating them into the calculation? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA57ZTOKGTFPXUXHS63YTUWM5AVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TCNJSGE2Q> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2024-02-14T22:15:27Z

Do you think it could work to just append the extra values to the embedding vector at

torchmd-net/torchmdnet/models/tensornet.py

Line 330 in c0edfed

Z = self.emb(z)

Or is that too simplistic?

guillemsimeon · 2024-02-14T22:21:40Z

It could work. However, I chose the option of just multiplying because including charges came after having the original TensorNet model, and I wanted to make sure the model with q=0 identically defaulted to the original one (the one for which I had run all benchmarks), plus I find it elegant and you don’t need to change any tensor or learnable layer shapes, or even introduce something extra learnable.

…

On Wed, 14 Feb 2024 at 23:15, Peter Eastman ***@***.***> wrote: Do you think it could work to just append the extra values to the embedding vector at https://github.com/torchmd/torchmd-net/blob/c0edfedfcdb841f0b52d571f3f788339ef5ff486/torchmdnet/models/tensornet.py#L330 Or is that too simplistic? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOAY7OVXDLLIKGABOIALYTUZQZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TSMZVGQYQ> . You are receiving this because you commented.Message ID: ***@***.***>

guillemsimeon · 2024-02-14T22:27:47Z

However, thinking it better, you need to change the output energy for different qs and spins with just two new neurons in the most initial part of the model, perhaps it gets too ‘diluted’ and it is difficult for the network. For me it makes more sense changing intermediate things, and things that you know have a large effect on predictions. Changing the interaction product strength can have a large effect (at least in my mind). On Wed, 14 Feb 2024 at 23:21, Guillem Simeon ***@***.***> wrote:

…

It could work. However, I chose the option of just multiplying because including charges came after having the original TensorNet model, and I wanted to make sure the model with q=0 identically defaulted to the original one (the one for which I had run all benchmarks), plus I find it elegant and you don’t need to change any tensor or learnable layer shapes, or even introduce something extra learnable. On Wed, 14 Feb 2024 at 23:15, Peter Eastman ***@***.***> wrote: > Do you think it could work to just append the extra values to the > embedding vector at > > > https://github.com/torchmd/torchmd-net/blob/c0edfedfcdb841f0b52d571f3f788339ef5ff486/torchmdnet/models/tensornet.py#L330 > > Or is that too simplistic? > > — > Reply to this email directly, view it on GitHub > <#26 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ANJMOAY7OVXDLLIKGABOIALYTUZQZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TSMZVGQYQ> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

peastman · 2024-02-14T23:01:09Z

I'll try both approaches and see if one works better. The embedding approach appeals to me because you can easily incorporate an arbitrary set of global or per-atom properties. I also have to admit that I don't really understand the logic behind the current approach. You increase the strength of all interactions for positive atoms/molecules and decrease the strength of all interactions for negative atoms/molecules. Why?

One could also take the embedding vector, append the extra values, and then pass it through a linear layer that mixes everything together and reduces it back down to the original length. Or you could do something similar to Cormorant. Instead of learning an embedding vector it learns a matrix that mixes together whatever input values you want for each atom and produces the embedding vector.

guillemsimeon · 2024-02-14T23:12:48Z

There is no logic in terms of physics, tbh. The only logic is that the magnitudes of the scalar, vector and tensor features that are fed to subsequent layers are different, and all subsequent layers learn to map these differences to different energies. It worked quite well in our experiments. If you ask me, the most simple approach for your case is: - count the number of extra args, d - intialize a linear from d to hidden channels, Lin - do Lin(extra_args) and elementwise multiply the output to the products and the residual update, similar to what is being done now (currently, all channels are weighted by the same number, in the previous setting, you modify strengths channelwise) But since this is deep learning, anything could be done. I think the right way to proceed is in terms of what is the minimal amount of modifications you need to make, and to make sure (at least in my opinion) that no extra_args defaults to original TensorNet. Otherwise I do not have performance guarantees for you in terms of current benchmark datasets, since we would be talking about a different model. I hope it helps.

…

On Thu, 15 Feb 2024 at 00:01, Peter Eastman ***@***.***> wrote: I'll try both approaches and see if one works better. The embedding approach appeals to me because you can easily incorporate an arbitrary set of global or per-atom properties. I also have to admit that I don't really understand the logic behind the current approach. You increase the strength of all interactions for positive atoms/molecules and decrease the strength of all interactions for negative atoms/molecules. Why? One could also take the embedding vector, append the extra values, and then pass it through a linear layer that mixes everything together and reduces it back down to the original length. Or you could do something similar to Cormorant. Instead of learning an embedding vector it learns a matrix that mixes together whatever input values you want for each atom and produces the embedding vector. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANJMOA53NL45UFYQCUPHHODYTU64DAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ4TGMBYHEYQ> . You are receiving this because you commented.Message ID: ***@***.***>

peastman · 2024-02-19T20:17:26Z

I tried using the current method to inject partial charges into the model. The result was not good: it roughly doubled the error. Then I tried the method I described above: append the partial charge to the embedding vector, and use a linear layer to mix it back down to the original length. That worked nicely and gave a good result.

This has the advantage that it easily generalizes to arbitrary numbers of global and per-atom scalar parameters. If there are no extra arguments, there's nothing to append and it skips the linear layer, so the model is unchanged. This same approach could be used for all the models if we want, not just TensorNet.

Would you be open to a PR implementing it?

giadefa · 2024-02-20T09:28:05Z

Hi Peter, we have results that show that partial charges work better. We are about to finalize a paper and will make available the final version within 30 days. Regarding your changes, I am not sure that they work in terms of maintaining the properties of TN. We can talk about this after we finalize the current work. g

…

On Mon, Feb 19, 2024 at 9:17 PM Peter Eastman ***@***.***> wrote: I tried using the current method to inject partial charges into the model. The result was not good: it roughly doubled the error. Then I tried the method I described above: append the partial charge to the embedding vector, and use a linear layer to mix it back down to the original length. That worked nicely and gave a good result. This has the advantage that it easily generalizes to arbitrary numbers of global and per-atom scalar parameters. If there are no extra arguments, there's nothing to append and it skips the linear layer, so the model is unchanged. This same approach could be used for all the models if we want, not just TensorNet. Would you be open to a PR implementing it? — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOWWQ6E547NJMAU76UDYUOXOHAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGMYTCMJTG4YA> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-20T15:49:27Z

we have results that show that partial charges work better.

Work better than what?

Regarding your changes, I am not sure that they work in terms of maintaining the properties of TN.

They exactly maintain it. If you don't add charges, they don't modify the model in any way.

giadefa · 2024-02-20T15:52:08Z

Than the model without charges

…

On Tue, Feb 20, 2024 at 4:49 PM Peter Eastman ***@***.***> wrote: we have results that show that partial charges work better. Work better than what? Regarding your changes, I am not sure that they work in terms of maintaining the properties of TN. They exactly maintain it. If you don't add charges, they don't modify the model in any way. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUORGMDQY2FE6A4MXL53YUTAZHAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TAOBYHEZA> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-20T15:54:26Z

Yes, that's what I'm finding as long as I inject the charges with the method I described. If I do it with the current code, it breaks the model.

giadefa · 2024-02-20T15:56:33Z

We are not using the method you described.

…

On Tue, Feb 20, 2024 at 4:54 PM Peter Eastman ***@***.***> wrote: Yes, that's what I'm finding as long as I inject the charges with the method I described. If I do it with the current code, it breaks the model. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOUXJKORYOTLV6MNML3YUTBMBAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TCOBYGU2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-20T15:58:32Z

What method are you using? If it's what's in the code, it doesn't work. If it's something else, please provide the code so I can try it.

guillemsimeon · 2024-02-20T16:20:12Z

Hi Peter,

the thing is that trying partial charges in our case (which were Gasteiger ones, btw, yours are fix?) was something we tried out of curiosity after knowing that total charges worked. We believe total charge is the way to proceed, and in fact, to the best of my knowledge, is currently being used as it is in the code now, and it helps the model to work fine when there are simultaneously charged and neutral molecules in the dataset (which can be sometimes confused for degenerate inputs). What I think is that your use case is different than ours, meaning that we have never tried partial charges + Coulomb. We didn't want to rely on partial charges because of their 'arbitrary' nature, as opposed to the real physical observable that total charge is.

Guillem

guillemsimeon · 2024-02-20T16:25:13Z

I also forgot to mention that, on top of what I said, relying on partial charges did not seem optimal to us because you need always to use exactly the same method of computation of the partial charges, since otherwise any small change in their value would mean a different energy output. In contrast, total charge is an integer.

giadefa · 2024-02-20T16:26:08Z

Well, there are situations where partial charges are good and that's why we used them. As soon as we finish the document, we share it.

…

On Tue, Feb 20, 2024 at 5:20 PM Guillem Simeon ***@***.***> wrote: Hi Peter, the thing is that trying partial charges in our case (which were Gasteiger ones, btw, yours are fix?) was something we tried out of curiosity after knowing that total charges worked. We believe total charge is the way to proceed, and in fact, to the best of my knowledge, is currently being used as it is in the code now, and it helps the model to work fine when there are simultaneously charged and neutral molecules in the dataset (which can be sometimes confused for degenerate inputs). What I think is that your use case is different than ours, meaning that we have never tried partial charges + Coulomb. We didn't want to rely on partial charges because of their 'arbitrary' nature, as opposed to the real physical observable that total charge is. Guillem — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOT7MDIF3ZON4LTFB7TYUTEMPAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TOMRZGE2Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-20T16:40:23Z

I'm using Gasteiger charges too. I'll make a PR and you can try it out.

We believe total charge is the way to proceed

I don't see how total charge can possibly work. It's a global property, but the model is based entirely on local computations. As it's doing computations for local regions of the system, knowing the total charge of the system provides no useful information. It has no idea how much of that charge is in the region it's looking at and how much is elsewhere, so there's nothing it can do with it. Instead it needs information about the local charge distribution, which partial charges give it. Perhaps it could work for tiny molecules where every atom is within the cutoff distance of every other atom, but not for anything larger than that.

For comparison, take a look at SpookyNet. It supplements the local computation with some global computation, which allows it to meaningfully use global information.

giadefa · 2024-02-20T16:44:57Z

Yes, I also think that total charge is going to be hard, and partial charges are useful, but for what we need, it has the advantage that it is a physical quantity, it does not depend on the version of RDkit.

…

On Tue, Feb 20, 2024 at 5:40 PM Peter Eastman ***@***.***> wrote: I'm using Gasteiger charges too. I'll make a PR and you can try it out. We believe total charge is the way to proceed I don't see how total charge can possibly work. It's a global property, but the model is based entirely on local computations. As it's doing computations for local regions of the system, knowing the total charge of the system provides no useful information. It has no idea how much of that charge is in the region it's looking at and how much is elsewhere, so there's nothing it can do with it. Instead it needs information about the local charge distribution, which partial charges give it. Perhaps it could work for tiny molecules where every atom is within the cutoff distance of every other atom, but not for anything larger than that. For comparison, take a look at SpookyNet. It supplements the local computation with some global computation, which allows it to meaningfully use global information. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOQ5DPFZDSYJLVFZXOLYUTGYJAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ3DCMZWGY3Q> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

peastman · 2024-02-20T21:01:53Z

Gasteiger charges don't depend on the version of RDKit. The algorithm was published in 1978 and hasn't changed since.

giadefa closed this as completed Dec 1, 2021

peastman reopened this Oct 3, 2022

peastman mentioned this issue Oct 6, 2022

Implement ZBL potential #134

Merged

peastman mentioned this issue Feb 21, 2024

Allow appending extra values to embedding vector #289

Open

Physics based priors #26

Physics based priors #26

Comments

peastman commented Jun 23, 2021

PhilippThoelke commented Jun 24, 2021

peastman commented Jun 24, 2021

peastman commented Oct 3, 2022

peastman commented Oct 3, 2022

peastman commented Oct 3, 2022

giadefa commented Oct 4, 2022 via email

giadefa commented Oct 4, 2022 via email

peastman commented Oct 4, 2022

peastman commented Oct 4, 2022

giadefa commented Oct 4, 2022 via email

giadefa commented Oct 4, 2022 via email

peastman commented Oct 4, 2022

giadefa commented Oct 4, 2022 via email

peastman commented Oct 4, 2022

giadefa commented Oct 4, 2022 via email

PhilippThoelke commented Oct 4, 2022 • edited Loading

Multi-head implementation

Prior args

Defining units

PhilippThoelke commented Oct 4, 2022

peastman commented Oct 4, 2022

peastman commented Oct 5, 2022

PhilippThoelke commented Oct 5, 2022 • edited Loading

peastman commented Nov 1, 2022

PhilippThoelke commented Nov 1, 2022

peastman commented Nov 1, 2022

giadefa commented Nov 2, 2022 via email

peastman commented Nov 2, 2022

raimis commented Nov 2, 2022

peastman commented Dec 1, 2022

peastman commented Feb 13, 2024

giadefa commented Feb 13, 2024 via email

peastman commented Feb 13, 2024

guillemsimeon commented Feb 13, 2024 via email

guillemsimeon commented Feb 13, 2024 via email

peastman commented Feb 13, 2024

guillemsimeon commented Feb 13, 2024 via email

peastman commented Feb 14, 2024

guillemsimeon commented Feb 14, 2024 via email

peastman commented Feb 14, 2024

guillemsimeon commented Feb 14, 2024 via email

peastman commented Feb 14, 2024

guillemsimeon commented Feb 14, 2024 via email

guillemsimeon commented Feb 14, 2024 via email

peastman commented Feb 14, 2024

guillemsimeon commented Feb 14, 2024 via email

peastman commented Feb 19, 2024

giadefa commented Feb 20, 2024 via email

peastman commented Feb 20, 2024

giadefa commented Feb 20, 2024 via email

peastman commented Feb 20, 2024

giadefa commented Feb 20, 2024 via email

peastman commented Feb 20, 2024

guillemsimeon commented Feb 20, 2024

guillemsimeon commented Feb 20, 2024

giadefa commented Feb 20, 2024 via email

peastman commented Feb 20, 2024

giadefa commented Feb 20, 2024 via email

peastman commented Feb 20, 2024

PhilippThoelke commented Oct 4, 2022 •

edited

Loading

PhilippThoelke commented Oct 5, 2022 •

edited

Loading