consistent asu diffmap #49

DHekstra · 2024-07-29T01:24:34Z

Summary:
(1)
For some non-canonical spacegroups (my case going from P21 21 21 to P21 1 1), Phenix and rs appear to not create consistent reciprocal space ASUs. By calling .hkl_to_asu() for each input MTZ, this PR ensures that the ASUs will be consistent. This affects both the calculation of ordinary and internal isomorphous difference maps.

(2) for internal difference maps, I include a flag, -u or --union, which if added leads the script to return a diffmap MTZ file created using the outer flag during the pandas merge call. Columns wDF, DF, SigDF are set to 0 for entries where either contributing reflection is not available.

(3) I added print statements:

to specify which symop (as a string) is applied in the internal difference map calculation (since we have been making incorrect assumptions about the numbering of these)
in utils/io.py to alert the user when subset_to_F_SigF applies French-WIlson scaling.

update:
(4) When splitting MTZs by symop and scaling them jointly, Miller indices related to themselves under a relevant symop will be present, in essence, inferred twice. When recombining the output MTZs, one can accidentally include these duplicate entries in the final MTZ (as I did). Although this is an upstream issue, I added a call to a new function (remove_HKL_duplicates) to diffmap.py and internaldiffmap.py` which checks for the presence of rows with the same Miller indices, removes duplicate entries (except for the first occurrences), and sets merged=True on each MTZ.

JBGreisman · 2024-08-05T17:56:59Z

I triggered a re-run of these tests due to the recent rs v1.0.2 release which included a fix to address the numpy/pandas compatibility issues that were causing failures

JBGreisman · 2024-08-05T18:32:15Z

Before doing a code review, I want to better understand the issues/context here:
(1) Do you mind explaining a bit more what's happening here regarding ASUs? We explicitly test the rs ASUs against sgtbx for all spacegroup settings (including non-canonical ones), so I'd be surprised to find that there's a difference between phenix/rs.
(2) I'm fine with this if you find it useful. Do you mind explaining it's use case, and why the parallel change to ordinary difference maps wouldn't be useful?
(3) Generally, I prefer these sorts of python libraries to be "silent" except in failure modes. However, since this is predominantly intended to be used a commandline application, I'm ok with including these print statements if you find them useful. In the future, I may advocate some sort of verbosity setting, especially if we move towards a more consistent Python API rather than commandline usage.
(4) In general, I think of these protocols as only being applicable for "merged" datasets, which should not have duplicated miller indices. I would prefer to detect such issues and raise some sort of input error, rather than sanitize the input. Please let me know if you disagree with this.

My general philosophy for (1) and (4) is that each of these commandline tools should have a clear interface with well-defined input, in this case being "merged" reflection data in the correct ASU. I agree we should check these input assumptions hold, and in some cases, sanitizing the input if it is easy to do so. From what I can tell, at least (4) seems intended to make this code robust to an upstream error. I worry such changes could have unintended consequences for other sorts of input data (such as unmerged data), but I'm also open to discussion on this.

DHekstra · 2024-08-05T20:53:45Z

Thanks for your thoughtful comments, Jack! I will review and respond in the next few days.

DHekstra · 2024-08-12T15:12:49Z

So, the thing seems to be: I used phenix.fmodel to generate an MTZ with phases needed to phase my difference map. The phenix.fmodel output MTZ produces

rs.mtzdump small_tasks_refine_748_fmodel.mtz
Traceback (most recent call last):
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/bin/rs.mtzdump", line 8, in <module>
    sys.exit(main())
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/lib/python3.9/site-packages/reciprocalspaceship/commandline/mtzdump.py", line 91, in main
    summarize(mtz, args.precision)
  File "/n/hekstra_lab/people/dhekstra/conda_envs/careless-12/lib/python3.9/site-packages/reciprocalspaceship/commandline/mtzdump.py", line 69, in summarize
    print(f"Spacegroup: {mtz.spacegroup.short_name()}")
AttributeError: 'NoneType' object has no attribute 'short_name'

I see the same with the output from phenix.refine directly. When I open these MTZs using rs by hand, set the spacegroup, and write to mtz, the resulting electron density looks fine.

DHekstra · 2024-08-12T15:14:49Z

(2) -- I do need the corresponding change to the ordinary difference map calculation as well.
(3) -- I can add a --verbose flag
(4) -- I will revisit this in the next few weeks (vacation in between). I agree there should be no silent sanitation. I propose to convert this to a check that throws an error if duplicates are detected, combined with an optional flag to purge duplicates.

JBGreisman · 2024-08-12T15:39:52Z

(1) I'm still a bit confused whether this is an issue in rs or somewhere else in the toolchain. If you want me to look in more detail, would it be possible to email me the mtz file? I would generally rather avoid extraneous calls to hkl_to_asu() if we can avoid it. At the end of the day, if the reflections are outside of the reciprocal ASU, I think of it as user input error. Perhaps the best solution would be to check for this condition and raise a ValueError.

(2) I'm ok with this being an added mode in both cases, though I'm still a bit unsure of its use. I would think that having "incorrect" 0 entries would be more confusing than dropping them in the first place.

(4) As with (1), this seems to me like an issue with user input that I don't think the library should support. I think it would be ok to detect such cases and raise an appropriate error.

DHekstra added 3 commits July 28, 2024 21:20

consistent asu diffmap

ed1b48f

changes to internaldiffmap & reprt FW

73ff30d

remove duplicate hkl rows

4b3b905

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consistent asu diffmap #49

consistent asu diffmap #49

DHekstra commented Jul 29, 2024 •

edited

Loading

JBGreisman commented Aug 5, 2024

JBGreisman commented Aug 5, 2024

DHekstra commented Aug 5, 2024

DHekstra commented Aug 12, 2024

DHekstra commented Aug 12, 2024 •

edited

Loading

JBGreisman commented Aug 12, 2024

consistent asu diffmap #49

Are you sure you want to change the base?

consistent asu diffmap #49

Conversation

DHekstra commented Jul 29, 2024 • edited Loading

JBGreisman commented Aug 5, 2024

JBGreisman commented Aug 5, 2024

DHekstra commented Aug 5, 2024

DHekstra commented Aug 12, 2024

DHekstra commented Aug 12, 2024 • edited Loading

JBGreisman commented Aug 12, 2024

DHekstra commented Jul 29, 2024 •

edited

Loading

DHekstra commented Aug 12, 2024 •

edited

Loading