Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cell file tool #50 #52

Merged
merged 18 commits into from
Feb 17, 2023
Merged

Add cell file tool #50 #52

merged 18 commits into from
Feb 17, 2023

Conversation

joelvdavies
Copy link
Contributor

@joelvdavies joelvdavies commented Feb 10, 2023

Adds methods to load in a structure file via ase (e.g. .cell) and use it to determine the closest atoms to the muon. There is then also a function to create input file text like I used for V3Si.

Still currently leaning towards keeping this out of the config file itself, due to its kind of sketchy nature as it would make it easy to make amendments to what it outputs in the case it doesn't do quite what you might expect. In this case I would think for galaxy we can use it as a library and hide the spin/interaction options when used. Then if the user wants to make modifications they can modify the resulting text file.

Current usage:

muspinsim-gen --help
muspinsim-gen V3Si_SC.cell 6 --dipolar --quadrupolar ./efg_nospin.out --ignore_symbol Si

Notes

  • ase logs warnings that castep isn't installed when loading .cell files - these can be safely ignored in most cases (it seems it does validation against the installed version when it is present)
  • Looking into GIPAW it doesn't look like there is a standard output file, only the data it prints to standard out - this implementation is therefore quite sketchy and may not always work - of course we may further expand on this for other tools that may be more standardised
  • Using a brute force method of finding closest atoms to the muon - there are methods in ase to compute terms in a given radius but this turned out to be extremely slow here (https://matsci.org/t/measuring-distance-with-ase/45623) - the main caveat is the user may need to be aware that the expansion may need to be increased to capture the whole structure (Currently expanding way larger than I think necessary for most cases though - so shouldn't be necessary)
  • Due to the above - it may also be sketchy in the case of oddly shaped unit cells/lattice vectors - in all the files I have worked with the lattice carts all appear like
   R1x   0.000000000000000   0.000000000000000
   0.000000000000000   R2y   0.000000000000000
   0.000000000000000   0.000000000000000   R3y

@joelvdavies joelvdavies self-assigned this Feb 10, 2023
Copy link
Contributor

@patrick-austin patrick-austin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I would think for galaxy we can use it as a library and hide the spin/interaction options when used. Then if the user wants to make modifications they can modify the resulting text file.

Just to make sure I'm understanding, this would involve creating a conditional section containing the interactions, one which accepts the .out and .cell and one which presents the current for for interactions? The other alternative would be to create a tool wrapper that wraps the cell file tool only, outputs a partial config file, then muspinsim_configure accepts the file as an input (or the current form options). Former keeps things "neater" (i.e. all config is in one tool so you don't need to chain it in a workflow), latter would make any sanity checking/modifications easier since it's an output in its own right. Either way the implementation here is the same since they'd both call it the same under the hood.

I didn't really understand the significance of the isotope stuff (i.e. what 1 means in that context) but I don't think that's an issue particularly since it's not fully supported yet anyway - I'm sure it would make more sense when/if fully implemented.

Regarding the technique to expand, rather than always considering a supercell containing (2n + 1)**3 unit cells (which in principle may contain too few NN, or atoms which if we expanded further might not be the NN, could we do something iterative?

  1. First calculate the NN in the unit cell, order by distance
  2. "Add" a layer by now considering cells within (+-1, +-1, +-1) (i.e. the next 26)
  3. Construct distances for this layer, and order by distance
  4. If any of the distances in new layer are small enough to be a nearest neighbour, add them to the list, re sort, and repeat from step 2
  5. If none of the atoms in the most recent layer are NN, then we can stop since all atoms in the next layer are even further away.

I'm not sure how likely it is to save us time, but I feel like it should in principle? As well as being more rigorous.

muspinsim/tools/generator.py Outdated Show resolved Hide resolved
muspinsim/tools/generator.py Outdated Show resolved Hide resolved
muspinsim/tools/generator.py Outdated Show resolved Hide resolved
muspinsim/tools/generator.py Outdated Show resolved Hide resolved
muspinsim/tools/generator.py Show resolved Hide resolved
muspinsim/input/structure.py Show resolved Hide resolved
muspinsim/input/structure.py Show resolved Hide resolved
muspinsim/input/structure.py Outdated Show resolved Hide resolved
muspinsim/input/structure.py Outdated Show resolved Hide resolved
muspinsim/input/structure.py Outdated Show resolved Hide resolved
@joelvdavies
Copy link
Contributor Author

joelvdavies commented Feb 14, 2023

multiplication factor for electrons

What does that mean exactly? My first instinct is that isotope shouldn't affect the electrons since there's the same number regardless of the mass, but I suppose this might be a spin thing - like are we denoting that this is an isotope with a particular nuclear spin (e.g. 3.5 V) as opposed to it's mass (e.g. 51 V - I think this is how that ASE data I linked is keyed) since the mass isn't important to us? This is just to satisfy my curiosity more than anything else.

I initially avoided the iterative approach just because to ensure everything was captured you always need to expand one more layer than you end up needing just to check in case the muon is at the edge of the cell as for the V3Si and this would just make it slower, but having worked with it I don't think it would be too bad and it would avoid one of the potential issues so I agree, I will look at doing that instead.

Ah so are you currently doing this with --expand_number 3 (as the default) because you know for the V example that's exactly how many you'd need to get the number of neighbours you want? I suppose the other thing to consider here (which I haven't done) is how long it takes in practice - if checking that final empty layer to make sure you've not missed any neighbours takes ~ms it's one thing, but because it's scaling like O(n**3) if that final layer ends up taking a minutes then that's another matter... I suppose keeping expand_number as an upper limit might help the search from exploding if the user misconfigures it somehow.

@joelvdavies
Copy link
Contributor Author

multiplication factor for electrons

What does that mean exactly? My first instinct is that isotope shouldn't affect the electrons since there's the same number regardless of the mass, but I suppose this might be a spin thing - like are we denoting that this is an isotope with a particular nuclear spin (e.g. 3.5 V) as opposed to it's mass (e.g. 51 V - I think this is how that ASE data I linked is keyed) since the mass isn't important to us? This is just to satisfy my curiosity more than anything else.

I initially avoided the iterative approach just because to ensure everything was captured you always need to expand one more layer than you end up needing just to check in case the muon is at the edge of the cell as for the V3Si and this would just make it slower, but having worked with it I don't think it would be too bad and it would avoid one of the potential issues so I agree, I will look at doing that instead.

Ah so are you currently doing this with --expand_number 3 (as the default) because you know for the V example that's exactly how many you'd need to get the number of neighbours you want? I suppose the other thing to consider here (which I haven't done) is how long it takes in practice - if checking that final empty layer to make sure you've not missed any neighbours takes ~ms it's one thing, but because it's scaling like O(n**3) if that final layer ends up taking a minutes then that's another matter... I suppose keeping expand_number as an upper limit might help the search from exploding if the user misconfigures it somehow.

I think you may have edited my comment above, I didn't realise you had responded 😄.

Yes so the way the spins are currently obtained, 51V in the input file would refer to the isotope, but something like 3e would be taken as an electron with spin 3/2. See the comment here

iso {int} -- Desired isotope. Ignored for 'mu'. If used for 'e', this
is interpreted as a number of strongly coupled electrons
acting as a single spin > 1/2. If not specified, the most
naturally abundant isotope is used.
(default: {None})

--expand_number 1 would have been enough for V3Si, so I just increased it because I thought it would be plenty, so really the iterative method will be faster anyway in this case. The only time I can see it not being plenty is if the cell file itself is very small or you are requesting many neighbours. The former case will be faster anyway as there is less to duplicate and so I still don't think it would be a problem unless the latter case applies which I would think is unlikely as muspinsim would not like hundreds of spins anyway. So overall I prefer the iterative solution, not sure if its worth assigning an upper limit? - if it takes too long you can just cancel it, otherwise it depends on the contents of the cell again.

@patrick-austin
Copy link
Contributor

Thanks for that link, that helped explain it.

My main concern would be "misuse" leading to really long execution time. E.G. if I ask for an element which doesn't actually appear in the cell, I don't want an interative method to keep searching forever (I don't think it would with the current set up but it's those edge cases that I think would be really painful if got wrong). Or typing 99 instead of 9 for number of nearest neighbours. Ultimately you can't protect the user from everything but it's worth considering. Manual cancellation is good for CL usage (assuming we log enough for it to be clear to a user that they should). Might be less intuitive in Galaxy (I think you permanently delete the output data set but you won't see the logging in the meantime I don't think).

@joelvdavies
Copy link
Contributor Author

joelvdavies commented Feb 15, 2023

Thanks for that link, that helped explain it.

My main concern would be "misuse" leading to really long execution time. E.G. if I ask for an element which doesn't actually appear in the cell, I don't want an interative method to keep searching forever (I don't think it would with the current set up but it's those edge cases that I think would be really painful if got wrong). Or typing 99 instead of 9 for number of nearest neighbours. Ultimately you can't protect the user from everything but it's worth considering. Manual cancellation is good for CL usage (assuming we log enough for it to be clear to a user that they should). Might be less intuitive in Galaxy (I think you permanently delete the output data set but you won't see the logging in the meantime I don't think).

Yes good points, I will add a modifiable limit in and double check how it performs to see if I can raise it a bit further. Also definitely worth adding some logging to see what is actually done.

Update: 2 seconds and ~250Mb for 6 - this had 474553 atoms in it as a result. Not sure how big/small we might expect the cells to go but V3Si has 216 in each and even if there were only 2 atoms in the cell that's still 2* 2197 to choose from so that should fine.

@joelvdavies joelvdavies force-pushed the add-cell-file-tool-#50 branch from fa9342c to 8a8263b Compare February 15, 2023 12:42
@joelvdavies joelvdavies force-pushed the add-cell-file-tool-#50 branch from 8a8263b to 07eac11 Compare February 15, 2023 12:44
@joelvdavies joelvdavies force-pushed the add-cell-file-tool-#50 branch from 4085902 to aea99a3 Compare February 15, 2023 14:17
@joelvdavies joelvdavies force-pushed the add-cell-file-tool-#50 branch from e6db2dd to 3618bde Compare February 15, 2023 16:03
@joelvdavies joelvdavies marked this pull request as ready for review February 16, 2023 08:42
Copy link
Contributor

@patrick-austin patrick-austin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code itself looks solid, just two places where I think there are minor typos in the comments/docstrings.

muspinsim/input/structure.py Outdated Show resolved Hide resolved
muspinsim/input/structure.py Outdated Show resolved Hide resolved
@joelvdavies joelvdavies merged commit 6f56929 into v2.1 Feb 17, 2023
@joelvdavies joelvdavies deleted the add-cell-file-tool-#50 branch February 17, 2023 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants