Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD Workflow for Bulk Electrolytes #19

Open
wants to merge 130 commits into
base: main
Choose a base branch
from
Open

MD Workflow for Bulk Electrolytes #19

wants to merge 130 commits into from

Conversation

muhammadhasyim
Copy link
Collaborator

This PR introduces the workflow to setup and run MD simulations of bulk electrolytes. Given a list of electrolyte components and their correponding structures + force field files, the workflow generates the necessarry inputs to run an OpenMM simulation. It optionally generates inputs to run the same simulations on LAMMPS.

This PR includes a number of Python scripts and modules, Bash scripts that runs the workflow, and a directory (ff) containing all force field files curated from literature.

There is still a few more details that need to be worked out. For instance, the workflow always assumes that there is salt in the electrolyte but we also have molten salt and pure ionic liquids, which do not have salt component in them. And ensuring that the workflow for all model systems chosen in our full spreadsheet. All of these will be added in future PR.

Copy link
Collaborator

@espottesmith espottesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review the lammps2omm.py file, because I just don't know those formats well enough to provide meaningful feedback at this point.

Mostly this looks good, and I greatly appreciate your work on this. I had some small, nitpicky style points and a couple of questions, for instance making sure that electrolytes with mol ratios for salts rather than mass or volume concentrations were being handled properly.

Comment on lines 40 to 42
```python

```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I wanted to erase that. Thanks for noticing.

def get_indices(comments, keyword):
""" Grab indices of labeled columns given a specific keyword.
Args:
comments (list): a list of strings, each of which is a label for a specific column
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called "comments" and not e.g. "labels"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Labels is a better name for this, I'll change that

species = np.array(systems[i])[indices]
return [name for name in species if name]

def run_packmol_moltemplate(species,boxsize,Nmols,filename,directory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but type annotations would be nice.

Comment on lines 165 to 168
try:
os.mkdir(directory)
except Exception as e:
print(e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly cleaner to use Path.mkdir(exist_ok=True) to avoid the try/except. Also, avoid generic Exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Never thought about using Path to check existing directory. I tend to use generic Exception a lot.

Comment on lines 171 to 172
bashCommand = f'cp {general_ff} ./{directory}'
os.system(bashCommand)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cleaner to use copy.copy()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean shutil.copy()? Because I think copy.copy() is for objects/classes, I think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made the same comments below.

Comment on lines 57 to 59
# Calculate how many salt species to add in the system. If units of the salt concentration
# is in molality (units == 'mass') then, we don't need solvent density. But if the units is
# in molarity (units == 'volume'), then we need the solvent density
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if it's neither mass nor volume (if salt concentration is given in terms of a mol fraction)? Seems like you aren't populating numsalt in that case?

Comment on lines 10 to 17
#Generate and run short simulation of the solvent
python generatesolvent.py $i
cd $i
python prepopenmmsim.py solvent
cp ../runsolvent.py ./
python runsolvent.py
cd ..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't need to be repeated every time, right? There's only a finite number of solvent systems that we're looking at, so we should just be able to run those once, storing the densities and whatever other numbers we need, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we talked about this, and you're right. I think it's just a matter of how we want to set up the workflow. Because at the end of the day, someone needs to run those first so I put that as part of the preparesimulations.sh

u.add_TopologyAttr('occupancies',[1.0]*Natoms)

# Open the corresponding PDB file (generated by packmol)
lmm.grab_pdbdata_attr(pdb_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are we inputting partial charge information? I know what was a question that Sanjeev raised earlier. We need some way to grab the charges after the fact so that we know what charge to run the DFT at for a given cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Partial charge information are actually stored somewhere and my suggestion is to generate a different text file with the ouputted partial charge and element names (Sanjeev mentions that the naming of the elements outputted from OpenMM is not exactly correct either).

cp ../runsystem.py ./
python runsystem.py
cd ..
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want/need to run this in parallel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can run them in parallel. Depending on how jobs are being submitted in the cluster, we could go GNU parallel as well.

from sys import stdout, exit


#TO-DO: Read temperature from CSV file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to be done? Seems like temp is fixed at 293 for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's a simple matter of reading from the previous CSV file, I've just been putting off editing this part of the code.

Copy link
Collaborator

@levineds levineds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite make it through everything, but here's what I got for now.

electrolytes/README.md Outdated Show resolved Hide resolved
## List of files and directories
Only the important ones
- README.md: this file
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations

- `runsimulations.sh`: a Bash script that will run the simulations one by one.
- ff: a directory of force field files of all electroyte components.
- elytes.csv: a CSV file listing all possible electrolyte systems we can simulate
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files

What do you mean by "run files"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the file you use when running on LAMMPS on the terminal, e.g., lmp -in run.lammps.


## How it works

The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible with LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.


The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.

The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.
The input to the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.

print(units)
if 'volume' in units:
# Solvent density in g/ml, obtained from averaging short MD run
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))
with open(f'{row_idx-1}/solventdata.txt', 'r') as f:
data = list(csv.reader(f))

# Solvent density in g/ml, obtained from averaging short MD run
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))
rho = np.array([float(row[3]) for row in data[1:]])
rho = np.mean(rho[int(len(rho)/2):])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rho = np.mean(rho[int(len(rho)/2):])
rho = np.mean(rho[len(rho) // 2:])

Comment on lines 87 to 90
for j in range(len(cat+an)):
Nmols.append(int(numsalt[j]))
for j in range(len(neut)):
Nmols.append(int(num_solv*solv_molfrac[j]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for j in range(len(cat+an)):
Nmols.append(int(numsalt[j]))
for j in range(len(neut)):
Nmols.append(int(num_solv*solv_molfrac[j]))
Nmols.extend(numsalt)
Nmols.extend(int(num_solv*solv_frac) for solv_frac in solv_molfrac)


Module to convert LAMMPS force field and DATA files to OpenMM
XML force field file and PDB file. The script is only tested with
The forcefield files contained in './ff' directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The forcefield files contained in './ff' directory
the forcefield files contained in './ff' directory


## Dictionary of (rounded) atomic masses and element names

atomic_masses_rounded = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a terrible way to map things. There are multiple collisions in this dictionary (granted they are for very heavy atoms). Is there some other way we could do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna change the PT(Periodic Table) module to see if they can do this for me. I think I did have a convoluted way to look up element names based on atomic masses.

Copy link
Collaborator

@levineds levineds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we're good to go once all the parameters are done.

bashCommand = f'cp {general_ff} ./{directory}'
os.system(bashCommand)

shutil.copy(f'{general_ff}', f'{directory}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shutil.copy(f'{general_ff}', f'{directory}')
shutil.copy(general_ff, directory)

Comment on lines 66 to 67
packmolstring = '\n'.join([f"tolerance 2.0",
f"filetype pdb",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
packmolstring = '\n'.join([f"tolerance 2.0",
f"filetype pdb",
packmolstring = '\n'.join(["tolerance 2.0",
"filetype pdb",

f.close()
#shutil.copy(f"./ff/{species[j]}.pdb", f'./{directory}')
#shutil.copy(f"./ff/{species[j]}.lt", f'./{directory}')
spec_name = f'{species[j]}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spec_name = f'{species[j]}'
spec_name = species[j]

for suffix in ('.pdb', '.lt'):
shutil.copy(os.path.join('ff', spec_name + suffix), os.path.join(directory, spec_name + suffix))

packmolstring += '\n'.join([f"structure {species[j]}.pdb",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
packmolstring += '\n'.join([f"structure {species[j]}.pdb",
packmolstring += '\n'.join([f"structure {spec_name}.pdb",

# Given the system LT and PDB file, which is generated from Packmol, run Moltemplate
bashCommand = f"cd {directory}; moltemplate.sh -pdb {filename}.pdb {filename}.lt; cd ..;"
os.system(bashCommand)
systemlt += '\n'.join([f'import "{species[j]}.lt"',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
systemlt += '\n'.join([f'import "{species[j]}.lt"',
systemlt += '\n'.join([f'import "{spec_name}.lt"',

numsalt = np.round(salt_conc*mass*Avog).astype(int)
elif 'number' == units:
elif 'number' == units or 'Number' == units:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif 'number' == units or 'Number' == units:
elif 'number' == units.lower():

mshuaibii and others added 30 commits October 21, 2024 22:46
…tal (#10)

Also added Lowdin and Mulliken bond orders since they were the only population feature missing from the NormalPrint level.
* Point to updated basis for Ln

* Upload def2-tzvpd basis

* Add a function to determine the symmetry-breaking block in ORCA

* Update recipes.py to use symmetry-breaking in a vertical-specific manner

We only want to break spin-symmetry for metal organic examples that are singlets.

* f-string bug

* point at path of basis set file

* add symmetry breaking to write_orca_calc

* assume basis lives in same directory

* organize basis directories

* black

* copy_file support

---------

Co-authored-by: Muhammed Shuaibi <[email protected]>
Missing commit from PR #13
…an Orca job (i.e. %maxcore) (#21)

* Add memory estimate function

* docstrings
* Remove OOD systems from ID systems
* don't save structures that are just the unsolvated structure
* Freeze center molecule (and don't do xTB relaxation on the final structure)
* Update RKS memory scaling with Orca 6 and use it for non-S3 calcs

* linting
* add script to write ani2x XYZ files

* Update biomolecules/ani2x/write_ani2x_xyzs.py

Co-authored-by: Daniel Levine <[email protected]>

* Update biomolecules/ani2x/write_ani2x_xyzs.py

Co-authored-by: Daniel Levine <[email protected]>

---------

Co-authored-by: Samuel Blau <[email protected]>
Co-authored-by: Daniel Levine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants