Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent asu diffmap #49

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

consistent asu diffmap #49

wants to merge 3 commits into from

Conversation

DHekstra
Copy link
Contributor

@DHekstra DHekstra commented Jul 29, 2024

Summary:
(1)
For some non-canonical spacegroups (my case going from P21 21 21 to P21 1 1), Phenix and rs appear to not create consistent reciprocal space ASUs. By calling .hkl_to_asu() for each input MTZ, this PR ensures that the ASUs will be consistent. This affects both the calculation of ordinary and internal isomorphous difference maps.

(2) for internal difference maps, I include a flag, -u or --union, which if added leads the script to return a diffmap MTZ file created using the outer flag during the pandas merge call. Columns wDF, DF, SigDF are set to 0 for entries where either contributing reflection is not available.

(3) I added print statements:

  • to specify which symop (as a string) is applied in the internal difference map calculation (since we have been making incorrect assumptions about the numbering of these)
  • in utils/io.py to alert the user when subset_to_F_SigF applies French-WIlson scaling.

update:
(4) When splitting MTZs by symop and scaling them jointly, Miller indices related to themselves under a relevant symop will be present, in essence, inferred twice. When recombining the output MTZs, one can accidentally include these duplicate entries in the final MTZ (as I did). Although this is an upstream issue, I added a call to a new function (remove_HKL_duplicates) to diffmap.py and internaldiffmap.py` which checks for the presence of rows with the same Miller indices, removes duplicate entries (except for the first occurrences), and sets merged=True on each MTZ.

@JBGreisman
Copy link
Member

I triggered a re-run of these tests due to the recent rs v1.0.2 release which included a fix to address the numpy/pandas compatibility issues that were causing failures

@JBGreisman
Copy link
Member

Before doing a code review, I want to better understand the issues/context here:
(1) Do you mind explaining a bit more what's happening here regarding ASUs? We explicitly test the rs ASUs against sgtbx for all spacegroup settings (including non-canonical ones), so I'd be surprised to find that there's a difference between phenix/rs.
(2) I'm fine with this if you find it useful. Do you mind explaining it's use case, and why the parallel change to ordinary difference maps wouldn't be useful?
(3) Generally, I prefer these sorts of python libraries to be "silent" except in failure modes. However, since this is predominantly intended to be used a commandline application, I'm ok with including these print statements if you find them useful. In the future, I may advocate some sort of verbosity setting, especially if we move towards a more consistent Python API rather than commandline usage.
(4) In general, I think of these protocols as only being applicable for "merged" datasets, which should not have duplicated miller indices. I would prefer to detect such issues and raise some sort of input error, rather than sanitize the input. Please let me know if you disagree with this.

My general philosophy for (1) and (4) is that each of these commandline tools should have a clear interface with well-defined input, in this case being "merged" reflection data in the correct ASU. I agree we should check these input assumptions hold, and in some cases, sanitizing the input if it is easy to do so. From what I can tell, at least (4) seems intended to make this code robust to an upstream error. I worry such changes could have unintended consequences for other sorts of input data (such as unmerged data), but I'm also open to discussion on this.

@DHekstra
Copy link
Contributor Author

DHekstra commented Aug 5, 2024

Thanks for your thoughtful comments, Jack! I will review and respond in the next few days.

@DHekstra
Copy link
Contributor Author

So, the thing seems to be: I used phenix.fmodel to generate an MTZ with phases needed to phase my difference map. The phenix.fmodel output MTZ produces

rs.mtzdump small_tasks_refine_748_fmodel.mtz
Traceback (most recent call last):
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/bin/rs.mtzdump", line 8, in <module>
    sys.exit(main())
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/lib/python3.9/site-packages/reciprocalspaceship/commandline/mtzdump.py", line 91, in main
    summarize(mtz, args.precision)
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/lib/python3.9/site-packages/reciprocalspaceship/commandline/mtzdump.py", line 69, in summarize
    print(f"Spacegroup: {mtz.spacegroup.short_name()}")
AttributeError: 'NoneType' object has no attribute 'short_name'

I see the same with the output from phenix.refine directly. When I open these MTZs using rs by hand, set the spacegroup, and write to mtz, the resulting electron density looks fine.

@DHekstra
Copy link
Contributor Author

DHekstra commented Aug 12, 2024

(2) -- I do need the corresponding change to the ordinary difference map calculation as well.
(3) -- I can add a --verbose flag
(4) -- I will revisit this in the next few weeks (vacation in between). I agree there should be no silent sanitation. I propose to convert this to a check that throws an error if duplicates are detected, combined with an optional flag to purge duplicates.

@JBGreisman
Copy link
Member

(1) I'm still a bit confused whether this is an issue in rs or somewhere else in the toolchain. If you want me to look in more detail, would it be possible to email me the mtz file? I would generally rather avoid extraneous calls to hkl_to_asu() if we can avoid it. At the end of the day, if the reflections are outside of the reciprocal ASU, I think of it as user input error. Perhaps the best solution would be to check for this condition and raise a ValueError.

(2) I'm ok with this being an added mode in both cases, though I'm still a bit unsure of its use. I would think that having "incorrect" 0 entries would be more confusing than dropping them in the first place.

(4) As with (1), this seems to me like an issue with user input that I don't think the library should support. I think it would be ok to detect such cases and raise an appropriate error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants