MD Workflow for Bulk Electrolytes #19

muhammadhasyim · 2024-05-24T07:18:36Z

This PR introduces the workflow to setup and run MD simulations of bulk electrolytes. Given a list of electrolyte components and their correponding structures + force field files, the workflow generates the necessarry inputs to run an OpenMM simulation. It optionally generates inputs to run the same simulations on LAMMPS.

This PR includes a number of Python scripts and modules, Bash scripts that runs the workflow, and a directory (ff) containing all force field files curated from literature.

There is still a few more details that need to be worked out. For instance, the workflow always assumes that there is salt in the electrolyte but we also have molten salt and pure ionic liquids, which do not have salt component in them. And ensuring that the workflow for all model systems chosen in our full spreadsheet. All of these will be added in future PR.

… new format for the electrolytes CSV file.

espottesmith

I did not review the lammps2omm.py file, because I just don't know those formats well enough to provide meaningful feedback at this point.

Mostly this looks good, and I greatly appreciate your work on this. I had some small, nitpicky style points and a couple of questions, for instance making sure that electrolytes with mol ratios for salts rather than mass or volume concentrations were being handled properly.

espottesmith · 2024-05-24T23:11:14Z

electrolytes/README.md

+```python
+
+```


Missing code?

No, I wanted to erase that. Thanks for noticing.

espottesmith · 2024-05-24T23:12:12Z

electrolytes/data2lammps.py

+def get_indices(comments, keyword):
+    """ Grab indices of labeled columns given a specific keyword. 
+        Args: 
+            comments (list): a list of strings, each of which is a label for a specific column


Why is this called "comments" and not e.g. "labels"?

Labels is a better name for this, I'll change that

espottesmith · 2024-05-24T23:14:33Z

electrolytes/data2lammps.py

+    species = np.array(systems[i])[indices]
+    return [name for name in species if name]
+
+def run_packmol_moltemplate(species,boxsize,Nmols,filename,directory):


Minor, but type annotations would be nice.

espottesmith · 2024-05-24T23:17:22Z

electrolytes/data2lammps.py

+    try:
+        os.mkdir(directory)
+    except Exception as e:
+        print(e)


Slightly cleaner to use Path.mkdir(exist_ok=True) to avoid the try/except. Also, avoid generic Exception.

Thanks! Never thought about using Path to check existing directory. I tend to use generic Exception a lot.

espottesmith · 2024-05-24T23:17:45Z

electrolytes/data2lammps.py

+    bashCommand = f'cp {general_ff} ./{directory}'
+    os.system(bashCommand)


Also cleaner to use copy.copy()

Do you mean shutil.copy()? Because I think copy.copy() is for objects/classes, I think

Yes, I made the same comments below.

espottesmith · 2024-05-25T00:04:03Z

electrolytes/generatesystem.py

+# Calculate how many salt species to add in the system. If units of the salt concentration 
+# is in molality (units == 'mass') then, we don't need solvent density. But if the units is
+# in molarity (units == 'volume'), then we need the solvent density


What about if it's neither mass nor volume (if salt concentration is given in terms of a mol fraction)? Seems like you aren't populating numsalt in that case?

espottesmith · 2024-05-25T00:07:31Z

electrolytes/preparesimulations.sh

+    #Generate and run short simulation of the solvent 
+    python generatesolvent.py $i
+    cd $i
+    python prepopenmmsim.py solvent
+    cp ../runsolvent.py ./
+    python runsolvent.py
+    cd ..
+


This shouldn't need to be repeated every time, right? There's only a finite number of solvent systems that we're looking at, so we should just be able to run those once, storing the densities and whatever other numbers we need, right?

Yeah we talked about this, and you're right. I think it's just a matter of how we want to set up the workflow. Because at the end of the day, someone needs to run those first so I put that as part of the preparesimulations.sh

espottesmith · 2024-05-25T00:08:33Z

electrolytes/prepopenmmsim.py

+u.add_TopologyAttr('occupancies',[1.0]*Natoms)
+
+# Open the corresponding PDB file (generated by packmol)
+lmm.grab_pdbdata_attr(pdb_file)


Question: are we inputting partial charge information? I know what was a question that Sanjeev raised earlier. We need some way to grab the charges after the fact so that we know what charge to run the DFT at for a given cluster.

Yes! Partial charge information are actually stored somewhere and my suggestion is to generate a different text file with the ouputted partial charge and element names (Sanjeev mentions that the naming of the elements outputted from OpenMM is not exactly correct either).

espottesmith · 2024-05-25T00:09:01Z

electrolytes/runsimulations.sh

+    cp ../runsystem.py ./
+    python runsystem.py
+    cd ..
+done


Do we want/need to run this in parallel?

We can run them in parallel. Depending on how jobs are being submitted in the cluster, we could go GNU parallel as well.

espottesmith · 2024-05-25T00:09:54Z

electrolytes/runsolvent.py

+from sys import stdout, exit
+
+
+#TO-DO: Read temperature from CSV file


Still need to be done? Seems like temp is fixed at 293 for now.

Yeah. It's a simple matter of reading from the previous CSV file, I've just been putting off editing this part of the code.

levineds

I didn't quite make it through everything, but here's what I got for now.

electrolytes/README.md

levineds · 2024-05-24T16:53:04Z

electrolytes/README.md

+## List of files and directories
+Only the important ones
+- README.md: this file
+- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files  for OpenMM simulations


Suggested change

- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations

- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations

levineds · 2024-05-24T16:53:46Z

electrolytes/README.md

+- `runsimulations.sh`: a Bash script that will run the simulations one by one. 
+- ff: a directory of force field files of all electroyte components. 
+- elytes.csv: a CSV file listing all possible electrolyte systems we can simulate
+- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to 


Suggested change

- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to

- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files

What do you mean by "run files"?

It's the file you use when running on LAMMPS on the terminal, e.g., lmp -in run.lammps.

levineds · 2024-05-24T16:54:28Z

electrolytes/README.md

+
+## How it works
+
+The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. 


Suggested change

The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.

The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible with LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.

levineds · 2024-05-24T16:54:55Z

electrolytes/README.md

+
+The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. 
+
+The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures. 


Suggested change

The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.

The input to the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.

levineds · 2024-05-24T23:40:23Z

electrolytes/generatesystem.py

+print(units)
+if 'volume' in units:
+    # Solvent density in g/ml, obtained from averaging short MD run
+    data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))


Suggested change

data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))

with open(f'{row_idx-1}/solventdata.txt', 'r') as f:

data = list(csv.reader(f))

levineds · 2024-05-24T23:40:57Z

electrolytes/generatesystem.py

+    # Solvent density in g/ml, obtained from averaging short MD run
+    data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))
+    rho = np.array([float(row[3]) for row in data[1:]])
+    rho = np.mean(rho[int(len(rho)/2):]) 


Suggested change

rho = np.mean(rho[int(len(rho)/2):])

rho = np.mean(rho[len(rho) // 2:])

levineds · 2024-05-24T23:47:09Z

electrolytes/generatesystem.py

+for j in range(len(cat+an)):
+    Nmols.append(int(numsalt[j]))
+for j in range(len(neut)):
+    Nmols.append(int(num_solv*solv_molfrac[j]))


Suggested change

for j in range(len(cat+an)):

Nmols.append(int(numsalt[j]))

for j in range(len(neut)):

Nmols.append(int(num_solv*solv_molfrac[j]))

Nmols.extend(numsalt)

Nmols.extend(int(num_solv*solv_frac) for solv_frac in solv_molfrac)

levineds · 2024-05-24T23:47:50Z

electrolytes/lammps2omm.py

+
+Module to convert LAMMPS force field and DATA files to OpenMM
+XML force field file and PDB file. The script is only tested with 
+The forcefield files contained in './ff' directory 


Suggested change

The forcefield files contained in './ff' directory

the forcefield files contained in './ff' directory

levineds · 2024-05-24T23:58:47Z

electrolytes/lammps2omm.py

+
+## Dictionary of (rounded) atomic masses and element names
+
+atomic_masses_rounded = {


This seems like a terrible way to map things. There are multiple collisions in this dictionary (granted they are for very heavy atoms). Is there some other way we could do this?

I'm gonna change the PT(Periodic Table) module to see if they can do this for me. I think I did have a convoluted way to look up element names based on atomic masses.

Co-authored-by: Daniel Levine <[email protected]>

levineds

Looks good. I think we're good to go once all the parameters are done.

levineds · 2024-06-20T17:00:16Z

electrolytes/data2lammps.py

-    bashCommand = f'cp {general_ff} ./{directory}'
-    os.system(bashCommand)
-
+    shutil.copy(f'{general_ff}', f'{directory}')


Suggested change

shutil.copy(f'{general_ff}', f'{directory}')

shutil.copy(general_ff, directory)

levineds · 2024-06-20T17:00:46Z

electrolytes/data2lammps.py

+    packmolstring = '\n'.join([f"tolerance 2.0",
+                    f"filetype pdb",


Suggested change

packmolstring = '\n'.join([f"tolerance 2.0",

f"filetype pdb",

packmolstring = '\n'.join(["tolerance 2.0",

"filetype pdb",

levineds · 2024-06-20T17:02:00Z

electrolytes/data2lammps.py

-    f.close()
+        #shutil.copy(f"./ff/{species[j]}.pdb", f'./{directory}')
+        #shutil.copy(f"./ff/{species[j]}.lt", f'./{directory}')
+        spec_name = f'{species[j]}'


Suggested change

spec_name = f'{species[j]}'

spec_name = species[j]

levineds · 2024-06-20T17:02:21Z

electrolytes/data2lammps.py

+        for suffix in ('.pdb', '.lt'):
+            shutil.copy(os.path.join('ff', spec_name + suffix), os.path.join(directory, spec_name + suffix))
+
+        packmolstring += '\n'.join([f"structure {species[j]}.pdb",


Suggested change

packmolstring += '\n'.join([f"structure {species[j]}.pdb",

packmolstring += '\n'.join([f"structure {spec_name}.pdb",

levineds · 2024-06-20T17:02:44Z

electrolytes/data2lammps.py

-    # Given the system LT and PDB file, which is generated from Packmol, run Moltemplate
-    bashCommand = f"cd {directory}; moltemplate.sh -pdb {filename}.pdb {filename}.lt; cd ..;"
-    os.system(bashCommand)
+        systemlt += '\n'.join([f'import "{species[j]}.lt"',


Suggested change

systemlt += '\n'.join([f'import "{species[j]}.lt"',

systemlt += '\n'.join([f'import "{spec_name}.lt"',

levineds · 2024-07-03T18:07:46Z

electrolytes/generatesystem.py

    numsalt = np.round(salt_conc*mass*Avog).astype(int)
-elif 'number' == units:
+elif 'number' == units or 'Number' == units:


Suggested change

elif 'number' == units or 'Number' == units:

elif 'number' == units.lower():

…tal (#10) Also added Lowdin and Mulliken bond orders since they were the only population feature missing from the NormalPrint level.

* Point to updated basis for Ln * Upload def2-tzvpd basis * Add a function to determine the symmetry-breaking block in ORCA * Update recipes.py to use symmetry-breaking in a vertical-specific manner We only want to break spin-symmetry for metal organic examples that are singlets. * f-string bug * point at path of basis set file * add symmetry breaking to write_orca_calc * assume basis lives in same directory * organize basis directories * black * copy_file support --------- Co-authored-by: Muhammed Shuaibi <[email protected]>

Missing commit from PR #13

…an Orca job (i.e. %maxcore) (#21) * Add memory estimate function * docstrings

…bital energies (#25)

* Remove OOD systems from ID systems * don't save structures that are just the unsolvated structure * Freeze center molecule (and don't do xTB relaxation on the final structure)

* Update RKS memory scaling with Orca 6 and use it for non-S3 calcs * linting

* add script to write ani2x XYZ files * Update biomolecules/ani2x/write_ani2x_xyzs.py Co-authored-by: Daniel Levine <[email protected]> * Update biomolecules/ani2x/write_ani2x_xyzs.py Co-authored-by: Daniel Levine <[email protected]> --------- Co-authored-by: Samuel Blau <[email protected]> Co-authored-by: Daniel Levine <[email protected]>

…om-data into md-elytes

Muhammad Risyad Hasyim added 12 commits April 22, 2024 19:18

Initial commit of force field files and Python scripts

390d737

Add XLS file of electrolyte components

f0d0012

Fixed a few bugs. System was not initialized properly.

81ea4bc

Delete the Excel file for electrolyte components

3d49ffe

Add comments and docstrings to all functions and routines

237e389

Rename the bash script

8abfe04

Add license

15e2c37

Fixed and renamed a force field file and its PDB file. Also adapted a…

1afd549

… new format for the electrolytes CSV file.

Updated README

0b6538d

Deleted comment

879d5c4

Deleted a comment

758bca6

Fixed a bug on system numbering

3c77043

muhammadhasyim requested review from levineds and espottesmith May 24, 2024 07:19

espottesmith reviewed May 25, 2024

View reviewed changes

levineds reviewed May 25, 2024

View reviewed changes

muhammadhasyim and others added 3 commits June 7, 2024 16:35

Update electrolytes/README.md

4c3c693

Co-authored-by: Daniel Levine <[email protected]>

Addressing PR comments to cleanup the code

11b6922

Fixing and adding new electrolyte components

4f83530

levineds reviewed Jun 20, 2024

View reviewed changes

Muhammad Risyad Hasyim added 7 commits June 27, 2024 12:01

Added new force field parameters and molecules

f217a41

Add new force field parameters

c29a2f5

Clean up code according to PR comments

872447d

Fixed force field files

6a54393

Update Python scripts

f38d018

Updated Python scripts

9f42525

Updated Python scripts

e27df2c

levineds reviewed Jul 3, 2024

View reviewed changes

Muhammad Risyad Hasyim added 2 commits July 7, 2024 12:58

Fix mislabeling of atom types on PDB and LT files for PF6- anion

4e7cb10

Fixing many more force field files.

bcae104

mshuaibii and others added 30 commits October 21, 2024 22:46

modify nbo input (#9)

6d1ee6f

Add printing of Reduced Mulliken and Lowdin populations for each orbi…

1f74202

…tal (#10) Also added Lowdin and Mulliken bond orders since they were the only population feature missing from the NormalPrint level.

grid3

3196e18

add archive message (#12)

374c794

Update README.md (#14)

01ff1de

Update calc.py (#15)

877bce2

add tcut (#16)

b46129c

Add metal-organics production scripts (#13)

6f076aa

Update mprun.py (#18)

f94d925

Missing commit from PR #13

remove max step override

f0b1e7d

Add a function to estimate (an upper-bound) to the memory needed for …

e3d01e6

…an Orca job (i.e. %maxcore) (#21) * Add memory estimate function * docstrings

cleanup

15ca7ad

decouple NBO flag

7d88349

nbo flags

5d5b3bc

Allow NBO analysis to be turned off (#22)

74dd90a

Update NBO skipping code for relaxations (#24)

2a48231

Update printing keywords for Orca 6 to include Fock matrix and all or…

8710b73

…bital energies (#25)

Functionalization of electrolyte-like/small molecules (#17)

4ac536e

Fixes to metal-organics generation code (#29)

9fe0413

improvements to solvate.py for massively parallel processing (#26)

d29a507

numpy < 2

e927a91

Electrolyte solvate fixes (#30)

deca033

* Remove OOD systems from ID systems * don't save structures that are just the unsolvated structure * Freeze center molecule (and don't do xTB relaxation on the final structure)

Final changes for solvated electrolytes (#31)

311683d

Update RKS memory scaling with Orca 6 and use it for non-S3 calcs (#32)

ce8918a

* Update RKS memory scaling with Orca 6 and use it for non-S3 calcs * linting

Add solvated protein fragment dataset (#33)

c9d8dc9

parallelize process scripts (#35)

e34271e

Fixed the electrolyte systems so that all charges are balanced

a0885df

Merge branch 'md-elytes' of https://github.com/Open-Catalyst-Project/…

e7d185f

…om-data into md-elytes

		bashCommand = f'cp {general_ff} ./{directory}'
		os.system(bashCommand)

		from sys import stdout, exit


		#TO-DO: Read temperature from CSV file

	- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations
	- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations

	- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to
	- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files


		## How it works

		The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.


		The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files.

		The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.

	The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.
	The input to the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures.

	data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r')))
	with open(f'{row_idx-1}/solventdata.txt', 'r') as f:
	data = list(csv.reader(f))

	rho = np.mean(rho[int(len(rho)/2):])
	rho = np.mean(rho[len(rho) // 2:])

	The forcefield files contained in './ff' directory
	the forcefield files contained in './ff' directory

		```python

		```


		## Dictionary of (rounded) atomic masses and element names

		atomic_masses_rounded = {

	shutil.copy(f'{general_ff}', f'{directory}')
	shutil.copy(general_ff, directory)

		packmolstring = '\n'.join([f"tolerance 2.0",
		f"filetype pdb",

	packmolstring += '\n'.join([f"structure {species[j]}.pdb",
	packmolstring += '\n'.join([f"structure {spec_name}.pdb",

	systemlt += '\n'.join([f'import "{species[j]}.lt"',
	systemlt += '\n'.join([f'import "{spec_name}.lt"',

	elif 'number' == units or 'Number' == units:
	elif 'number' == units.lower():

MD Workflow for Bulk Electrolytes #19

Are you sure you want to change the base?

MD Workflow for Bulk Electrolytes #19

Conversation

muhammadhasyim commented May 24, 2024

espottesmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

levineds left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

levineds left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment