-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD Workflow for Bulk Electrolytes #19
base: main
Are you sure you want to change the base?
Conversation
… new format for the electrolytes CSV file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review the lammps2omm.py file, because I just don't know those formats well enough to provide meaningful feedback at this point.
Mostly this looks good, and I greatly appreciate your work on this. I had some small, nitpicky style points and a couple of questions, for instance making sure that electrolytes with mol ratios for salts rather than mass or volume concentrations were being handled properly.
electrolytes/README.md
Outdated
```python | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I wanted to erase that. Thanks for noticing.
electrolytes/data2lammps.py
Outdated
def get_indices(comments, keyword): | ||
""" Grab indices of labeled columns given a specific keyword. | ||
Args: | ||
comments (list): a list of strings, each of which is a label for a specific column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called "comments" and not e.g. "labels"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labels is a better name for this, I'll change that
electrolytes/data2lammps.py
Outdated
species = np.array(systems[i])[indices] | ||
return [name for name in species if name] | ||
|
||
def run_packmol_moltemplate(species,boxsize,Nmols,filename,directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but type annotations would be nice.
electrolytes/data2lammps.py
Outdated
try: | ||
os.mkdir(directory) | ||
except Exception as e: | ||
print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly cleaner to use Path.mkdir(exist_ok=True) to avoid the try/except. Also, avoid generic Exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Never thought about using Path to check existing directory. I tend to use generic Exception a lot.
electrolytes/data2lammps.py
Outdated
bashCommand = f'cp {general_ff} ./{directory}' | ||
os.system(bashCommand) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cleaner to use copy.copy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean shutil.copy()? Because I think copy.copy() is for objects/classes, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made the same comments below.
electrolytes/generatesystem.py
Outdated
# Calculate how many salt species to add in the system. If units of the salt concentration | ||
# is in molality (units == 'mass') then, we don't need solvent density. But if the units is | ||
# in molarity (units == 'volume'), then we need the solvent density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if it's neither mass nor volume (if salt concentration is given in terms of a mol fraction)? Seems like you aren't populating numsalt in that case?
electrolytes/preparesimulations.sh
Outdated
#Generate and run short simulation of the solvent | ||
python generatesolvent.py $i | ||
cd $i | ||
python prepopenmmsim.py solvent | ||
cp ../runsolvent.py ./ | ||
python runsolvent.py | ||
cd .. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need to be repeated every time, right? There's only a finite number of solvent systems that we're looking at, so we should just be able to run those once, storing the densities and whatever other numbers we need, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we talked about this, and you're right. I think it's just a matter of how we want to set up the workflow. Because at the end of the day, someone needs to run those first so I put that as part of the preparesimulations.sh
electrolytes/prepopenmmsim.py
Outdated
u.add_TopologyAttr('occupancies',[1.0]*Natoms) | ||
|
||
# Open the corresponding PDB file (generated by packmol) | ||
lmm.grab_pdbdata_attr(pdb_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: are we inputting partial charge information? I know what was a question that Sanjeev raised earlier. We need some way to grab the charges after the fact so that we know what charge to run the DFT at for a given cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Partial charge information are actually stored somewhere and my suggestion is to generate a different text file with the ouputted partial charge and element names (Sanjeev mentions that the naming of the elements outputted from OpenMM is not exactly correct either).
electrolytes/runsimulations.sh
Outdated
cp ../runsystem.py ./ | ||
python runsystem.py | ||
cd .. | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want/need to run this in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can run them in parallel. Depending on how jobs are being submitted in the cluster, we could go GNU parallel as well.
electrolytes/runsolvent.py
Outdated
from sys import stdout, exit | ||
|
||
|
||
#TO-DO: Read temperature from CSV file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to be done? Seems like temp is fixed at 293 for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It's a simple matter of reading from the previous CSV file, I've just been putting off editing this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite make it through everything, but here's what I got for now.
electrolytes/README.md
Outdated
## List of files and directories | ||
Only the important ones | ||
- README.md: this file | ||
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations | |
- `preparesimulations.sh`: a Bash script that will prepare the initial system configurations in the elytes.csv files for OpenMM simulations |
electrolytes/README.md
Outdated
- `runsimulations.sh`: a Bash script that will run the simulations one by one. | ||
- ff: a directory of force field files of all electroyte components. | ||
- elytes.csv: a CSV file listing all possible electrolyte systems we can simulate | ||
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files to | |
- `data2lammps.py`: a Python module to generate LAMMPS DATA and run files |
What do you mean by "run files"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the file you use when running on LAMMPS on the terminal, e.g., lmp -in run.lammps.
electrolytes/README.md
Outdated
|
||
## How it works | ||
|
||
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. | |
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible with LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. |
electrolytes/README.md
Outdated
|
||
The workflow uses Packmol to generate a system configuration and Moltemplate to generate force field files. However, the format generated is only compatible to LAMMPS. Thus, the next step is to convert the LAMMPS files to OpenMM-compatible files. | ||
|
||
The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures. | |
The input to the workflow is the `ff` directory, which contains the PDB and LT files of all electrolyte components, and elytes.csv, which specifies the molar/molal concentrations of the salt and ratios for solvent mixtures. |
electrolytes/generatesystem.py
Outdated
print(units) | ||
if 'volume' in units: | ||
# Solvent density in g/ml, obtained from averaging short MD run | ||
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r'))) | |
with open(f'{row_idx-1}/solventdata.txt', 'r') as f: | |
data = list(csv.reader(f)) |
electrolytes/generatesystem.py
Outdated
# Solvent density in g/ml, obtained from averaging short MD run | ||
data = list(csv.reader(open(f'{i-1}/solventdata.txt', 'r'))) | ||
rho = np.array([float(row[3]) for row in data[1:]]) | ||
rho = np.mean(rho[int(len(rho)/2):]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rho = np.mean(rho[int(len(rho)/2):]) | |
rho = np.mean(rho[len(rho) // 2:]) |
electrolytes/generatesystem.py
Outdated
for j in range(len(cat+an)): | ||
Nmols.append(int(numsalt[j])) | ||
for j in range(len(neut)): | ||
Nmols.append(int(num_solv*solv_molfrac[j])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for j in range(len(cat+an)): | |
Nmols.append(int(numsalt[j])) | |
for j in range(len(neut)): | |
Nmols.append(int(num_solv*solv_molfrac[j])) | |
Nmols.extend(numsalt) | |
Nmols.extend(int(num_solv*solv_frac) for solv_frac in solv_molfrac) |
|
||
Module to convert LAMMPS force field and DATA files to OpenMM | ||
XML force field file and PDB file. The script is only tested with | ||
The forcefield files contained in './ff' directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forcefield files contained in './ff' directory | |
the forcefield files contained in './ff' directory |
electrolytes/lammps2omm.py
Outdated
|
||
## Dictionary of (rounded) atomic masses and element names | ||
|
||
atomic_masses_rounded = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a terrible way to map things. There are multiple collisions in this dictionary (granted they are for very heavy atoms). Is there some other way we could do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna change the PT(Periodic Table) module to see if they can do this for me. I think I did have a convoluted way to look up element names based on atomic masses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we're good to go once all the parameters are done.
electrolytes/data2lammps.py
Outdated
bashCommand = f'cp {general_ff} ./{directory}' | ||
os.system(bashCommand) | ||
|
||
shutil.copy(f'{general_ff}', f'{directory}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutil.copy(f'{general_ff}', f'{directory}') | |
shutil.copy(general_ff, directory) |
electrolytes/data2lammps.py
Outdated
packmolstring = '\n'.join([f"tolerance 2.0", | ||
f"filetype pdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packmolstring = '\n'.join([f"tolerance 2.0", | |
f"filetype pdb", | |
packmolstring = '\n'.join(["tolerance 2.0", | |
"filetype pdb", |
electrolytes/data2lammps.py
Outdated
f.close() | ||
#shutil.copy(f"./ff/{species[j]}.pdb", f'./{directory}') | ||
#shutil.copy(f"./ff/{species[j]}.lt", f'./{directory}') | ||
spec_name = f'{species[j]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec_name = f'{species[j]}' | |
spec_name = species[j] |
electrolytes/data2lammps.py
Outdated
for suffix in ('.pdb', '.lt'): | ||
shutil.copy(os.path.join('ff', spec_name + suffix), os.path.join(directory, spec_name + suffix)) | ||
|
||
packmolstring += '\n'.join([f"structure {species[j]}.pdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packmolstring += '\n'.join([f"structure {species[j]}.pdb", | |
packmolstring += '\n'.join([f"structure {spec_name}.pdb", |
electrolytes/data2lammps.py
Outdated
# Given the system LT and PDB file, which is generated from Packmol, run Moltemplate | ||
bashCommand = f"cd {directory}; moltemplate.sh -pdb {filename}.pdb {filename}.lt; cd ..;" | ||
os.system(bashCommand) | ||
systemlt += '\n'.join([f'import "{species[j]}.lt"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemlt += '\n'.join([f'import "{species[j]}.lt"', | |
systemlt += '\n'.join([f'import "{spec_name}.lt"', |
electrolytes/generatesystem.py
Outdated
numsalt = np.round(salt_conc*mass*Avog).astype(int) | ||
elif 'number' == units: | ||
elif 'number' == units or 'Number' == units: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif 'number' == units or 'Number' == units: | |
elif 'number' == units.lower(): |
…tal (#10) Also added Lowdin and Mulliken bond orders since they were the only population feature missing from the NormalPrint level.
* Point to updated basis for Ln * Upload def2-tzvpd basis * Add a function to determine the symmetry-breaking block in ORCA * Update recipes.py to use symmetry-breaking in a vertical-specific manner We only want to break spin-symmetry for metal organic examples that are singlets. * f-string bug * point at path of basis set file * add symmetry breaking to write_orca_calc * assume basis lives in same directory * organize basis directories * black * copy_file support --------- Co-authored-by: Muhammed Shuaibi <[email protected]>
Missing commit from PR #13
…an Orca job (i.e. %maxcore) (#21) * Add memory estimate function * docstrings
…bital energies (#25)
* Remove OOD systems from ID systems * don't save structures that are just the unsolvated structure * Freeze center molecule (and don't do xTB relaxation on the final structure)
* Update RKS memory scaling with Orca 6 and use it for non-S3 calcs * linting
* add script to write ani2x XYZ files * Update biomolecules/ani2x/write_ani2x_xyzs.py Co-authored-by: Daniel Levine <[email protected]> * Update biomolecules/ani2x/write_ani2x_xyzs.py Co-authored-by: Daniel Levine <[email protected]> --------- Co-authored-by: Samuel Blau <[email protected]> Co-authored-by: Daniel Levine <[email protected]>
This PR introduces the workflow to setup and run MD simulations of bulk electrolytes. Given a list of electrolyte components and their correponding structures + force field files, the workflow generates the necessarry inputs to run an OpenMM simulation. It optionally generates inputs to run the same simulations on LAMMPS.
This PR includes a number of Python scripts and modules, Bash scripts that runs the workflow, and a directory (ff) containing all force field files curated from literature.
There is still a few more details that need to be worked out. For instance, the workflow always assumes that there is salt in the electrolyte but we also have molten salt and pure ionic liquids, which do not have salt component in them. And ensuring that the workflow for all model systems chosen in our full spreadsheet. All of these will be added in future PR.