Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider use of fingerprint distance instead of StructureMatcher for comparison between generated and test #39

Closed
sgbaird opened this issue Aug 2, 2022 · 7 comments

Comments

@sgbaird
Copy link
Member

sgbaird commented Aug 2, 2022

From CDVAE paper:

We use fingerprint distance, rather than RMSE from StructureMatcher (Ong et al., 2013), because the material space is too large for the models to generate enough materials to exactly match the ground truth materials. StructureMatcher first requires the compositions of two materials to exactly match, which will cause all models to have close-to-zero coverage.

@sgbaird
Copy link
Member Author

sgbaird commented Aug 2, 2022

#38

@sgbaird
Copy link
Member Author

sgbaird commented Aug 3, 2022

Compare matminer featurizers with hash approach

@kjappelbaum
Copy link

Ok, how do we design the experiment?

Potential fingerprints:

  • PXRD pattern
  • SOAP
  • JarvisCFID

Then, look at how the distribution of pairwise distances looks like. How many in 1% percentile distance, etc.?

@sgbaird
Copy link
Member Author

sgbaird commented Aug 4, 2022

Found this from CDVAE manuscript which is right in line with what you mentioned previously:

image

Figure 8: Change of COV-R and COV-P by varying δstruc: and δcomp: for MP-20. Dashed line
denotes the current chosen thresholds.

They used Euclidean distances between Magpie feature vectors for compositional distance and between CrystalNN fingerprint for structural fingerprints, and a "match" meant both the compositional and structural (Euclidean) distances were lower than (somewhat) arbitrarily chosen thresholds. I lean towards using ElMD for the compositional distance via chem_wasserstein. Maybe using Earth Mover's Distance for the CrystalNN fingerprint as well via dist-matrix. For now for simplicity, maybe stick with CDVAE's implementation?

If eventually we do go with chem_wasserstein and dist_matrix, then it would probably make sense for me to revisit integrating ElMD into matminer hackingmaterials/matminer#726.

@sgbaird
Copy link
Member Author

sgbaird commented Aug 4, 2022

cdvae_coverage as default now

@sgbaird
Copy link
Member Author

sgbaird commented Aug 5, 2022

Planning to implement loading precomputed compositional and structural fingerprints from FigShare (still need to calculate and upload) to save time computing the metric, since the structural fingerprinting can take a while. The fingerprints for generated structures will still need to be computed by the user, but should only be a few minutes for 1000 structures.

sparks-baird/mp-time-split#42

Signing off for now, though.

@sgbaird
Copy link
Member Author

sgbaird commented Aug 7, 2022

Can always circle back to this or create a new issue, but a CDVAE-style implementation of a coverage metric seems to be functional now.

@sgbaird sgbaird closed this as completed Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants