cleaned finder and removed pbc (getting it from Lattice) #706

zerothi · 2024-03-12T19:20:32Z

The neighbor finder got an overhaul to check for PBC and friends.

Deleted pbc arguments to routines, they now get it from the geometry object contained.
The find_all_unique_pairs changed named to find_unique_pairs (all was superfluous)
I tried to make a test (see final test in the test_finder.py where one is located outside the unit cell. This problem is also inherent to the current way of doing things, but I wanted to give it a try here as well. And I found the same problem.
I.e. if one runs this small snippet one also gets:

import sisl as si

geom = si.Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])
n = si.geom.NeighborFinder(geom, R=1.1)
print(n.find_neighbors())

output

[array([[ 1, -1,  0,  0]]), array([], shape=(0, 4), dtype=int64)]

@pfebrer you have any clue was goes on here?

Also, we need to discuss what should be done. Because the current way things are handled (in both this and the old part of the code) needs to be fixed in one way. I don't know if one should always do a translation to the UC for PB directions, and then special care should be done along non PB directions.

src/sisl/geom/_neighbors/tests/test_finder.py

+    geom = Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])
+
+    # We should have fewer bins along the first lattice vector
+    n = NeighborFinder(geom, R=1.1)


codecov · 2024-03-12T19:28:53Z

Codecov Report

Attention: Patch coverage is 6.45161% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 86.64%. Comparing base (e5f622a) to head (345b418).
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/sisl/geom/_neighbors/tests/test_finder.py	0.00%	28 Missing ⚠️
src/sisl/geom/_neighbors/_finder.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #706      +/-   ##
==========================================
- Coverage   86.67%   86.64%   -0.03%     
==========================================
  Files         399      399              
  Lines       50727    50739      +12     
==========================================
- Hits        43966    43964       -2     
- Misses       6761     6775      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfebrer · 2024-03-13T15:31:38Z

The problem with that output is that the second atom should have [0,1,0,0] as a neighbor right?

If you can't pass pbc to the neighbor computing routines there should be a way of changing the boundary conditions without having to rebuild the finder. Because all the arrays are exactly the same regardless of the boundary conditions (as far as I remember).

zerothi · 2024-03-13T15:59:59Z

You just change the bc in the contained geometry. I don't think we should complicate things here. What we really want is that the pbc of the lattice should govern everything. Having a geometry without pbc, then asking for pbc neighbors is also confusing, hence the rationale. I can't (immediately) see a workflow where changing between pbc and not would be valuable.

As for the neighbor return, we need to figure out what out of cell coordinates mean, because it should essentially be the same as translate2uc coordinates.

zerothi · 2024-03-13T20:04:24Z

To expand on the neighbor locating stuff from outside the unit-cell.

Consider these two systems:

xyz1 = [
   [0, 0, 0],
   [3, 0, 0]
]
cell1 = cell2 = [2, 10, 10]

xyz1 = [
   [0, 0, 0],
   [1, 0, 0]
]
pbc1 = pbc2 = True

they are effectively the same system. So we should treat it as such.

However, it won't be as simple as translating to the primary unit cell. Consider this system:

xyz1 = [
   [0, 0, 0]
   [1, 0, 0.5],
   [1, 1, -0.5]
]
xyz2 = xyz1 + [0, 0, 2]
pbc = True, True, False

here a translation into the UC wouldn't work.
So how to handle these corner cases becomes problematic...

pfebrer · 2024-03-13T21:20:53Z

On your first example, I don't think both should be treated the same. There is a supercell shift in the second atom and that shift (but with opposite sign) should also be present on the neighbors.

For that, we can translate to the unit cell and store the original supercell index.

I don't understand the problem with the second example.

zerothi · 2024-03-14T07:45:59Z

On your first example, I don't think both should be treated the same. There is a supercell shift in the second atom and that shift (but with opposite sign) should also be present on the neighbors.
But the system is periodic, so essentially it does not matter if an atom is located at r or r+int * lattice, they are equivalent. This exact problem is also what caused my routines to fail for weird cases, see #102 (and I know there were other issues as well).

So the example above, one should find that the first atom has 2 neighbors, at [-1, 0, 0] and one in the unit-cell. And equivalently with the 2nd atom, it should have at [1, 0, 0] and [0, 0, 0].

For that, we can translate to the unit cell and store the original supercell index.

I don't understand the problem with the second example.

The problem of the 2nd example is that one cannot simply translate all coordinates, and in most cases, a lattice vector of finite size without PBC would still wrap around in most systems. So in this case the user has all atoms lying at the edge of the lattice box. Then an MD would push the atom outside of the lattice box. So a translation without PBC would mean that the coordinates are not neighbors (whereas in fact they are).

pfebrer · 2024-03-14T10:27:15Z

I don't agree with what you say from the first example. I think for xyz1 the neighbors of the first atom should be:

[
    [1, 0, 0, 0],
    [1, -1, 0, 0]
]

And in xyz2 they should be:

[
    [1, -1, 0, 0],
    [1, -2, 0, 0]
]

(each neighbor is [ia, isc1, isc2, isc3]). This is what SIESTA does when setting up the sparse matrices, isn't it?

In the second example, I hadn't seen the pbc=False. If the conditions are not periodic, a cell doesn't make any sense, no? It is just a construct that we create to simulate the system in a program with finite memory and usually pbc. So in that case we could just ignore the cell, translate all the coordinates to positive values and set the limits of the builder's grid as min(xyz), max(xyz).

zerothi · 2024-03-14T10:56:44Z

I don't agree with what you say from the first example. I think for xyz1 the neighbors of the first atom should be:
[
    [1, 0, 0, 0],
    [1, -1, 0, 0]
]

Thats also what I wrote. :)
The second atom should have:

 [
     [0, 0, 0, 0],
     [0, 1, 0, 0]
 ]

And in xyz2 they should be:
[
    [1, -1, 0, 0],
    [1, -2, 0, 0]
]
(each neighbor is [ia, isc1, isc2, isc3]). This is what SIESTA does when setting up the sparse matrices, isn't it?

In the second example, I hadn't seen the pbc=False. If the conditions are not periodic, a cell doesn't make any sense, no? It is just a construct that we create to simulate the system in a program with finite memory and usually pbc. So in that case we could just ignore the cell, translate all the coordinates to positive values and set the limits of the builder's grid as min(xyz), max(xyz).

Yes, but the problem is that the coordinates typically come from a code, and those codes have a lattice vector. E.g. in Siesta for a slab, you could see the above (2nd case), but you could also have a case where:

xyz2 = [
   [0, 0, 0.5]
   [1, 0, lattice.cell[2, 2] - 0.5],
]

and for Siesta, this would still amount to them being neighbors. So it is not a matter of pure translation, nor about UC-translations.

pfebrer · 2024-03-14T11:14:20Z

In the first example what I didn't agree with is what you say about xyz2, with xyz1 it was fine.

The example that you give about SIESTA, then pbc should be True, because these are the boundary conditions being used 😅 I think it doesn't make any sense to try to account for periodicity when pbc is False. It would be a nightmare to keep track (and explain to users consistently) of the places were we respect the non-periodic boundaries and the places where we don't.

zerothi · 2024-03-14T11:14:36Z

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

pfebrer · 2024-03-14T11:20:53Z

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

I don't see it necessarily as a problem. These are two different situations, and the supercell shifts are different. The atom that has been displaced outside the unit cell is not the same atom as its image in the unit cell, so I think it's more consistent to keep this differentiation. The auxiliary supercell gets bigger, but the number of entries in the matrix stays the same.

zerothi · 2024-03-14T11:27:50Z

In the first example what I didn't agree with is what you say about xyz2, with xyz1 it was fine.

The example that you give about SIESTA, then pbc should be True, because these are the boundary conditions being used 😅 I think it doesn't make any sense to try to account for periodicity when pbc is False. It would be a nightmare to keep track (and explain to users consistently) of the places were we respect the non-periodic boundaries and the places where we don't.

Ok, but then we need some way to alert users when they change bc in the lattice object from a geometry coming from a code. E.g. if the above situation happens, and how to do that. :)
Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

And note, Siesta also has some problems with this, see https://gitlab.com/siesta-project/siesta/-/issues/129

I don't see it necessarily as a problem. These are two different situations, and the supercell shifts are different. The atom that has been displaced outside the unit cell is not the same atom as its image in the unit cell, so I think it's more consistent to keep this differentiation. The auxiliary supercell gets bigger, but the number of entries in the matrix stays the same.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells). Also it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

pfebrer · 2024-03-14T18:28:41Z

Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

Hmm I guess, but it would probably be overkill, we could just introduce the check on the method that sets boundary conditions.

Although I think there's no need to worry that much about this. It is perfectly logic that removing periodicity will mean that atoms that interacted through periodic images will no longer interact. Sure some users will be surprised the first time their code fails if they don't take this into account, but then they'll learn and will understand that the behavior is perfectly consistent. Maybe later we can provide a method that detects vacuum and moves its center to the edge of the cell, if that is useful.

Also_ it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

Well if they want to do that they can first translate to the unit cell, that's easy to solve.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells).

How fast is "quite fast"? I just did a quick calculation and we are talking about 150k atoms (20 orbitals each) and a 9×9x9 auxiliary cell. And the more atoms, the more unlikely it is that you need an extra supercell, because the cell will be bigger so an atom will need a really long simulation to go through it. And if it becomes a problem, you can always translate everything into the unit cell or go with int64, whatever you prefer.

I think that translating to the unit cell for computing neighbors but keeping the coordinates outside of the unit cell could very easily lead to inconsistencies in different parts of the code.

Also I don't think it necessarily makes things cleaner and easier to understand. Imagine a molecule which half of its atoms have moved outside the unit cell. If you want to move everything to the unit cell for computing neighbors, some intramolecular interactions will be supercell interactions. Isn't that weird? My point is that not everything is made clearer by translating to the unit cell and therefore I don't think it is worth it to risk losing consistency. You can always translate to the unit cell explicitly if you need/want it, and then there won't be any risk of inconsistencies.

zerothi · 2024-03-15T12:35:59Z

Could your Nodes flow be used to issue code when a variable changes (on-the-fly)?

Hmm I guess, but it would probably be overkill, we could just introduce the check on the method that sets boundary conditions.

Although I think there's no need to worry that much about this. It is perfectly logic that removing periodicity will mean that atoms that interacted through periodic images will no longer interact. Sure some users will be surprised the first time their code fails if they don't take this into account, but then they'll learn and will understand that the behavior is perfectly consistent. Maybe later we can provide a method that detects vacuum and moves its center to the edge of the cell, if that is useful.

Yeah. That might be better.

Also_ it gets difficult when users wants to cut couplings (through set_nsc(a|b|c=1|3) etc. That won't work if they the max/min aux cells are offset.

Well if they want to do that they can first translate to the unit cell, that's easy to solve.
I am not sure it is so easy to solve.
The problem is that users coming from DFT codes are not used to this. So they expect it to work (seamlessly). And if sisl converts coordinates on-the-fly we should do so extremely carefully.

Well, the problem is that the integers used to hold the indices grows, and you'll much faster reach the limits (currently int32 can be hit quite fast with supercells).

How fast is "quite fast"? I just did a quick calculation and we are talking about 150k atoms (20 orbitals each) and a 9×9x9 auxiliary cell. And the more atoms, the more unlikely it is that you need an extra supercell, because the cell will be bigger so an atom will need a really long simulation to go through it. And if it becomes a problem, you can always translate everything into the unit cell or go with int64, whatever you prefer.

Ok, I was too quick, it happens first at ~80 million orbitals.
However, we can't go to int64, scipy.sparse uses int32 (internally).

I think that translating to the unit cell for computing neighbors but keeping the coordinates outside of the unit cell could very easily lead to inconsistencies in different parts of the code.
Agreed, and this is my main concern. That there will be an unexpected usage when people reads DFT output (say an MD that throws an atom outside the UC). What should happen in sisl there? As of now, sisl is already having problems with this (#698 e.g.)

Also I don't think it necessarily makes things cleaner and easier to understand. Imagine a molecule which half of its atoms have moved outside the unit cell. If you want to move everything to the unit cell for computing neighbors, some intramolecular interactions will be supercell interactions. Isn't that weird? My point is that not everything is made clearer by translating to the unit cell and therefore I don't think it is worth it to risk losing consistency. You can always translate to the unit cell explicitly if you need/want it, and then there won't be any risk of inconsistencies.

But even your code (and mine as well) still thinks these are supercell connections, right?

pfebrer · 2024-03-15T12:58:06Z

The problem is that users coming from DFT codes are not used to this. So they expect it to work (seamlessly). And if sisl converts coordinates on-the-fly we should do so extremely carefully.

What I'm saying precisely is that we shouldn't convert anything. Internally for practical reasons of the algorithm we should use the periodic images that are inside the unit cell, but we should take that into account when we return the neighbors.

Agreed, and this is my main concern. That there will be an unexpected usage when people reads DFT output (say an MD that throws an atom outside the UC). What should happen in sisl there? As of now, sisl is already having problems with this (#698 e.g.)

I would keep the atom outside the unit cell and take that into account when computing neighbors.

But even your code (and mine as well) still thinks these are supercell connections, right?

Hmm no, unless I have a bug, I try to adhere to SIESTA's way of describing connections, which I think is the most consistent one (although I admit that at the beginning it was not easy to understand). For example in the neighbor finder I didn't take this into account, so it is wrong 😅

There might be cases in which having all atoms in the unit cell is easier to manage and leads to the same result (e.g. to compute the density on the grid), but then what you can do is translate2uc before computing it (internally).

zerothi · 2024-03-19T09:32:09Z

I will merge this, but it would be great if you could solve the neighbor problem in the NeighborFinder :-D

The neighbor finder got an overhaul to check for PBC and friends. Signed-off-by: Nick Papior <[email protected]>

pfebrer · 2024-03-19T09:45:34Z

But we still didn't agree on the convention to follow 😅

zerothi · 2024-03-19T09:53:48Z

But we still didn't agree on the convention to follow 😅

True, but first step is to make it actually find the neighbors.
So

geom = si.Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])
n = si.geom.NeighborFinder(geom, R=1.1)
print(n.find_neighbors())
> [array([[ 1, -1,  0,  0]]), array([], shape=(0, 4), dtype=int64)]

should return neighbors for both atoms, regardless of convention. I.e. it has PBC.

I am also thinking about whether we should change the return value to be a tuple of indices, having everything in one array is nice. But I think it would be more easily understood by the end-user if it is returned in a tuple:

uc_idx, isc

or perhaps a namedtuple so we can extend the return values?

Then we should take up the discussion of the convention later (it shouldn't matter here, no?)

github-advanced-security bot found potential problems Mar 12, 2024

View reviewed changes

src/sisl/geom/_neighbors/tests/test_finder.py

geom = Geometry([[0, 0, 0], [3, 0, 0]], lattice=[2, 10, 10])

# We should have fewer bins along the first lattice vector

n = NeighborFinder(geom, R=1.1)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable n is not used.

cleaned finder and removed pbc (getting it from Lattice)

345b418

The neighbor finder got an overhaul to check for PBC and friends. Signed-off-by: Nick Papior <[email protected]>

zerothi force-pushed the neighbor-test branch from a69f44c to 345b418 Compare March 19, 2024 09:35

zerothi merged commit d11dc6a into main Mar 19, 2024
6 of 8 checks passed

zerothi deleted the neighbor-test branch March 19, 2024 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleaned finder and removed pbc (getting it from Lattice) #706

cleaned finder and removed pbc (getting it from Lattice) #706

zerothi commented Mar 12, 2024

codecov bot commented Mar 12, 2024 •

edited

Loading

pfebrer commented Mar 13, 2024

zerothi commented Mar 13, 2024

zerothi commented Mar 13, 2024

pfebrer commented Mar 13, 2024 •

edited

Loading

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 15, 2024

pfebrer commented Mar 15, 2024

zerothi commented Mar 19, 2024 •

edited

Loading

pfebrer commented Mar 19, 2024

zerothi commented Mar 19, 2024 •

edited

Loading

cleaned finder and removed pbc (getting it from Lattice) #706

cleaned finder and removed pbc (getting it from Lattice) #706

Conversation

zerothi commented Mar 12, 2024

codecov bot commented Mar 12, 2024 • edited Loading

Codecov Report

pfebrer commented Mar 13, 2024

zerothi commented Mar 13, 2024

zerothi commented Mar 13, 2024

pfebrer commented Mar 13, 2024 • edited Loading

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 14, 2024

pfebrer commented Mar 14, 2024

zerothi commented Mar 15, 2024

pfebrer commented Mar 15, 2024

zerothi commented Mar 19, 2024 • edited Loading

pfebrer commented Mar 19, 2024

zerothi commented Mar 19, 2024 • edited Loading

codecov bot commented Mar 12, 2024 •

edited

Loading

pfebrer commented Mar 13, 2024 •

edited

Loading

zerothi commented Mar 19, 2024 •

edited

Loading

zerothi commented Mar 19, 2024 •

edited

Loading