Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaned finder and removed pbc (getting it from Lattice) #706

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

zerothi
Copy link
Owner

@zerothi zerothi commented Mar 12, 2024

The neighbor finder got an overhaul to check for PBC and friends.

  1. Deleted pbc arguments to routines, they now get it from the geometry object contained.
  2. The find_all_unique_pairs changed named to find_unique_pairs (all was superfluous)
  3. I tried to make a test (see final test in the test_finder.py where one is located outside the unit cell. This problem is also inherent to the current way of doing things, but I wanted to give it a try here as well. And I found the same problem.
    I.e. if one runs this small snippet one also gets:
import sisl as si

geom = si.Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])
n = si.geom.NeighborFinder(geom, R=1.1)
print(n.find_neighbors())

output

[array([[ 1, -1,  0,  0]]), array([], shape=(0, 4), dtype=int64)]

@pfebrer you have any clue was goes on here?

Also, we need to discuss what should be done. Because the current way things are handled (in both this and the old part of the code) needs to be fixed in one way. I don't know if one should always do a translation to the UC for PB directions, and then special care should be done along non PB directions.

geom = Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])

# We should have fewer bins along the first lattice vector
n = NeighborFinder(geom, R=1.1)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable n is not used.
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 6.45161% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 86.64%. Comparing base (e5f622a) to head (345b418).
Report is 1 commits behind head on main.

Files Patch % Lines
src/sisl/geom/_neighbors/tests/test_finder.py 0.00% 28 Missing ⚠️
src/sisl/geom/_neighbors/_finder.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #706      +/-   ##
==========================================
- Coverage   86.67%   86.64%   -0.03%     
==========================================
  Files         399      399              
  Lines       50727    50739      +12     
==========================================
- Hits        43966    43964       -2     
- Misses       6761     6775      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pfebrer
Copy link
Contributor

pfebrer commented Mar 13, 2024

The problem with that output is that the second atom should have [0,1,0,0] as a neighbor right?

If you can't pass pbc to the neighbor computing routines there should be a way of changing the boundary conditions without having to rebuild the finder. Because all the arrays are exactly the same regardless of the boundary conditions (as far as I remember).

@zerothi
Copy link
Owner Author

zerothi commented Mar 13, 2024

You just change the bc in the contained geometry. I don't think we should complicate things here. What we really want is that the pbc of the lattice should govern everything. Having a geometry without pbc, then asking for pbc neighbors is also confusing, hence the rationale. I can't (immediately) see a workflow where changing between pbc and not would be valuable.

As for the neighbor return, we need to figure out what out of cell coordinates mean, because it should essentially be the same as translate2uc coordinates.

@zerothi
Copy link
Owner Author

zerothi commented Mar 13, 2024

To expand on the neighbor locating stuff from outside the unit-cell.

Consider these two systems:

xyz1 = [
   [0, 0, 0],
   [3, 0, 0]
]
cell1 = cell2 = [2, 10, 10]

xyz1 = [
   [0, 0, 0],
   [1, 0, 0]
]
pbc1 = pbc2 = True

they are effectively the same system. So we should treat it as such.

However, it won't be as simple as translating to the primary unit cell. Consider this system:

xyz1 = [
   [0, 0, 0]
   [1, 0, 0.5],
   [1, 1, -0.5]
]
xyz2 = xyz1 + [0, 0, 2]
pbc = True, True, False

here a translation into the UC wouldn't work.
So how to handle these corner cases becomes problematic...

@pfebrer
Copy link
Contributor

pfebrer commented Mar 13, 2024

On your first example, I don't think both should be treated the same. There is a supercell shift in the second atom and that shift (but with opposite sign) should also be present on the neighbors.

For that, we can translate to the unit cell and store the original supercell index.

I don't understand the problem with the second example.

@zerothi
Copy link
Owner Author

zerothi commented Mar 14, 2024

On your first example, I don't think both should be treated the same. There is a supercell shift in the second atom and that shift (but with opposite sign) should also be present on the neighbors.
But the system is periodic, so essentially it does not matter if an atom is located at r or r+int * lattice, they are equivalent. This exact problem is also what caused my routines to fail for weird cases, see #102 (and I know there were other issues as well).

So the example above, one should find that the first atom has 2 neighbors, at [-1, 0, 0] and one in the unit-cell. And equivalently with the 2nd atom, it should have at [1, 0, 0] and [0, 0, 0].

For that, we can translate to the unit cell and store the original supercell index.

I don't understand the problem with the second example.

The problem of the 2nd example is that one cannot simply translate all coordinates, and in most cases, a lattice vector of finite size without PBC would still wrap around in most systems. So in this case the user has all atoms lying at the edge of the lattice box. Then an MD would push the atom outside of the lattice box. So a translation without PBC would mean that the coordinates are not neighbors (whereas in fact they are).

@pfebrer
Copy link
Contributor

pfebrer commented Mar 14, 2024

I don't agree with what you say from the first example. I think for xyz1 the neighbors of the first atom should be:

[
    [1, 0, 0, 0],
    [1, -1, 0, 0]
]

And in xyz2 they should be:

[
    [1, -1, 0, 0],
    [1, -2, 0, 0]
]

(each neighbor is [ia, isc1, isc2, isc3]). This is what SIESTA does when setting up the sparse matrices, isn't it?

In the second example, I hadn't seen the pbc=False. If the conditions are not periodic, a cell doesn't make any sense, no? It is just a construct that we create to simulate the system in a program with finite memory and usually pbc. So in that case we could just ignore the cell, translate all the coordinates to positive values and set the limits of the builder's grid as min(xyz), max(xyz).

@zerothi
Copy link
Owner Author

zerothi commented Mar 14, 2024

I don't agree with what you say from the first example. I think for xyz1 the neighbors of the first atom should be:

[
    [1, 0, 0, 0],
    [1, -1, 0, 0]
]

Thats also what I wrote. :)
The second atom should have:

 [
     [0, 0, 0, 0],
     [0, 1, 0, 0]
 ]

And in xyz2 they should be:

[
    [1, -1, 0, 0],
    [1, -2, 0, 0]
]

(each neighbor is [ia, isc1, isc2, isc3]). This is what SIESTA does when setting up the sparse matrices, isn't it?

In the second example, I hadn't seen the pbc=False. If the conditions are not periodic, a cell doesn't make any sense, no? It is just a construct that we create to simulate the system in a program with finite memory and usually pbc. So in that case we could just ignore the cell, translate all the coordinates to positive values and set the limits of the builder's grid as min(xyz), max(xyz).

Yes, but the problem is that the coordinates typically come from a code, and those codes have a lattice vector. E.g. in Siesta for a slab, you could see the above (2nd case), but you could also have a case where:

xyz2 = [
   [0, 0, 0.5]
   [1, 0, lattice.cell[2, 2] - 0.5],
]

and for Siesta, this would still amount to them being neighbors. So it is not a matter of pure translation, nor about UC-translations.

@pfebrer
Copy link
Contributor

pfebrer commented Mar 14, 2024

In the first example what I didn't agree with is what you say about xyz2, with xyz1 it was fine.

The example that you give about SIESTA, then pbc should be True, because these are the boundary conditions being used 😅 I think it doesn't make any sense to try to account for periodicity when pbc is False. It would be a nightmare to keep track (and explain to users consistently) of the places were we respect the non-periodic boundaries and the places where we don't.

@zerothi
Copy link
Owner Author

zerothi commented Mar 14, 2024

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

@pfebrer
Copy link
Contributor

pfebrer commented Mar 14, 2024

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

I don't see it necessarily as a problem. These are two different situations, and the supercell shifts are different. The atom that has been displaced outside the unit cell is not the same atom as its image in the unit cell, so I think it's more consistent to keep this differentiation. The auxiliary supercell gets bigger, but the number of entries in the matrix stays the same.

@zerothi
Copy link
Owner Author

zerothi commented Mar 14, 2024

In the first example what I didn't agree with is what you say about xyz2, with xyz1 it was fine.

The example that you give about SIESTA, then pbc should be True, because these are the boundary conditions being used 😅 I think it doesn't make any sense to try to account for periodicity when pbc is False. It would be a nightmare to keep track (and explain to users consistently) of the places were we respect the non-periodic boundaries and the places where we don't.

Ok, but then we need some way to alert users when they change bc in the lattice object from a geometry coming from a code. E.g. if the above situation happens, and how to do that. :)
Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

I don't see it necessarily as a problem. These are two different situations, and the supercell shifts are different. The atom that has been displaced outside the unit cell is not the same atom as its image in the unit cell, so I think it's more consistent to keep this differentiation. The auxiliary supercell gets bigger, but the number of entries in the matrix stays the same.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells). Also it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

@pfebrer
Copy link
Contributor

pfebrer commented Mar 14, 2024

Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

Hmm I guess, but it would probably be overkill, we could just introduce the check on the method that sets boundary conditions.

Although I think there's no need to worry that much about this. It is perfectly logic that removing periodicity will mean that atoms that interacted through periodic images will no longer interact. Sure some users will be surprised the first time their code fails if they don't take this into account, but then they'll learn and will understand that the behavior is perfectly consistent. Maybe later we can provide a method that detects vacuum and moves its center to the edge of the cell, if that is useful.

Also_ it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

Well if they want to do that they can first translate to the unit cell, that's easy to solve.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells).

How fast is "quite fast"? I just did a quick calculation and we are talking about 150k atoms (20 orbitals each) and a 9×9x9 auxiliary cell. And the more atoms, the more unlikely it is that you need an extra supercell, because the cell will be bigger so an atom will need a really long simulation to go through it. And if it becomes a problem, you can always translate everything into the unit cell or go with int64, whatever you prefer.

I think that translating to the unit cell for computing neighbors but keeping the coordinates outside of the unit cell could very easily lead to inconsistencies in different parts of the code.

Also I don't think it necessarily makes things cleaner and easier to understand. Imagine a molecule which half of its atoms have moved outside the unit cell. If you want to move everything to the unit cell for computing neighbors, some intramolecular interactions will be supercell interactions. Isn't that weird? My point is that not everything is made clearer by translating to the unit cell and therefore I don't think it is worth it to risk losing consistency. You can always translate to the unit cell explicitly if you need/want it, and then there won't be any risk of inconsistencies.

@zerothi
Copy link
Owner Author

zerothi commented Mar 15, 2024

Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

Hmm I guess, but it would probably be overkill, we could just introduce the check on the method that sets boundary conditions.

Although I think there's no need to worry that much about this. It is perfectly logic that removing periodicity will mean that atoms that interacted through periodic images will no longer interact. Sure some users will be surprised the first time their code fails if they don't take this into account, but then they'll learn and will understand that the behavior is perfectly consistent. Maybe later we can provide a method that detects vacuum and moves its center to the edge of the cell, if that is useful.

Yeah. That might be better.

Also_ it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

Well if they want to do that they can first translate to the unit cell, that's easy to solve.
I am not sure it is so easy to solve.
The problem is that users coming from DFT codes are not used to this. So they expect it to work (seamlessly). And if sisl converts coordinates on-the-fly we should do so extremely carefully.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells).

How fast is "quite fast"? I just did a quick calculation and we are talking about 150k atoms (20 orbitals each) and a 9×9x9 auxiliary cell. And the more atoms, the more unlikely it is that you need an extra supercell, because the cell will be bigger so an atom will need a really long simulation to go through it. And if it becomes a problem, you can always translate everything into the unit cell or go with int64, whatever you prefer.

Ok, I was too quick, it happens first at ~80 million orbitals.
However, we can't go to int64, scipy.sparse uses int32 (internally).

I think that translating to the unit cell for computing neighbors but keeping the coordinates outside of the unit cell could very easily lead to inconsistencies in different parts of the code.
Agreed, and this is my main concern. That there will be an unexpected usage when people reads DFT output (say an MD that throws an atom outside the UC). What should happen in sisl there? As of now, sisl is already having problems with this (#698 e.g.)

Also I don't think it necessarily makes things cleaner and easier to understand. Imagine a molecule which half of its atoms have moved outside the unit cell. If you want to move everything to the unit cell for computing neighbors, some intramolecular interactions will be supercell interactions. Isn't that weird? My point is that not everything is made clearer by translating to the unit cell and therefore I don't think it is worth it to risk losing consistency. You can always translate to the unit cell explicitly if you need/want it, and then there won't be any risk of inconsistencies.

But even your code (and mine as well) still thinks these are supercell connections, right?

@pfebrer
Copy link
Contributor

pfebrer commented Mar 15, 2024

The problem is that users coming from DFT codes are not used to this. So they expect it to work (seamlessly). And if sisl converts coordinates on-the-fly we should do so extremely carefully.

What I'm saying precisely is that we shouldn't convert anything. Internally for practical reasons of the algorithm we should use the periodic images that are inside the unit cell, but we should take that into account when we return the neighbors.

Agreed, and this is my main concern. That there will be an unexpected usage when people reads DFT output (say an MD that throws an atom outside the UC). What should happen in sisl there? As of now, sisl is already having problems with this (#698 e.g.)

I would keep the atom outside the unit cell and take that into account when computing neighbors.

But even your code (and mine as well) still thinks these are supercell connections, right?

Hmm no, unless I have a bug, I try to adhere to SIESTA's way of describing connections, which I think is the most consistent one (although I admit that at the beginning it was not easy to understand). For example in the neighbor finder I didn't take this into account, so it is wrong 😅

There might be cases in which having all atoms in the unit cell is easier to manage and leads to the same result (e.g. to compute the density on the grid), but then what you can do is translate2uc before computing it (internally).

@zerothi
Copy link
Owner Author

zerothi commented Mar 19, 2024

I will merge this, but it would be great if you could solve the neighbor problem in the NeighborFinder :-D

The neighbor finder got an overhaul to check for PBC
and friends.

Signed-off-by: Nick Papior <[email protected]>
@pfebrer
Copy link
Contributor

pfebrer commented Mar 19, 2024

But we still didn't agree on the convention to follow 😅

@zerothi
Copy link
Owner Author

zerothi commented Mar 19, 2024

But we still didn't agree on the convention to follow 😅

True, but first step is to make it actually find the neighbors.
So

geom = si.Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])
n = si.geom.NeighborFinder(geom, R=1.1)
print(n.find_neighbors())
> [array([[ 1, -1,  0,  0]]), array([], shape=(0, 4), dtype=int64)]

should return neighbors for both atoms, regardless of convention. I.e. it has PBC.

I am also thinking about whether we should change the return value to be a tuple of indices, having everything in one array is nice. But I think it would be more easily understood by the end-user if it is returned in a tuple:

uc_idx, isc

or perhaps a namedtuple so we can extend the return values?

Then we should take up the discussion of the convention later (it shouldn't matter here, no?)

@zerothi zerothi merged commit d11dc6a into main Mar 19, 2024
6 of 8 checks passed
@zerothi zerothi deleted the neighbor-test branch March 19, 2024 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants