Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve metadata tutorial #12931

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Conversation

cbrnr
Copy link
Contributor

@cbrnr cbrnr commented Oct 31, 2024

I might have found some things that could be improved in the excellent metadata tutorial:

  • The text did not match what the code was doing in the stimulus-locked example (all correct epochs vs. slow correct epochs).
  • I am not sure what the "final sanity check" is supposed to show - the output only shows an empty table, but the text says "Bummer! It seems the very first two responses were recorded before the first stimulus appeared: the values in the stimulus column are None." I don't really see this in the output?
  • Maybe this is related to 2, but the ERN example (response-locked) shows a metadata table with 402 rows, whereas the stimulus-locked metadata has only 400 rows. This should not be the case, right?
  • The second-to-last plot shows two images side by side. First, I don't know how these were created (the code shows two independent plot commands). Second, the title of the right plot is cut off.
  • Mention actual authors at the top.

@hoechenberger
Copy link
Member

Thanks @cbrnr, I might be able to take a look later today or tomorrow (I think I initially wrote a larger part of this very tutorial)

@hoechenberger
Copy link
Member

  • I am not sure what the "final sanity check" is supposed to show - the output only shows an empty table, but the text says "Bummer! It seems the very first two responses were recorded before the first stimulus appeared: the values in the stimulus column are None." I don't really see this in the output?

I checked and it shows the correct output in the rendered documentation up until MNE-Python 1.6:

https://mne.tools/1.6/auto_tutorials/epochs/40_autogenerate_metadata.html#applying-the-knowledge-visualizing-the-ern-component

For newer versions, it generates an empty table, like you observed. I'll try to look into this.

@hoechenberger
Copy link
Member

hoechenberger commented Nov 3, 2024

  • Maybe this is related to 2, but the ERN example (response-locked) shows a metadata table with 402 rows, whereas the stimulus-locked metadata has only 400 rows. This should not be the case, right?

Yes this is to be expected, there were 400 stimuli but 402 button presses. In our analysis, we wish to only consider the first button press following a stimulus.

This is directly related to your previous question / observation.

@hoechenberger
Copy link
Member

hoechenberger commented Nov 3, 2024

Ok I found the problem.

MWE:

# %%
import mne

data_dir = mne.datasets.erp_core.data_path()
infile = data_dir / "ERP-CORE_Subject-001_Task-Flankers_eeg.fif"

raw = mne.io.read_raw(infile, preload=True)
raw.filter(l_freq=0.1, h_freq=40)
all_events, all_event_id = mne.events_from_annotations(raw)

metadata_tmin, metadata_tmax = -1.5, 0
row_events = ["response/left", "response/right"]
keep_last = ["stimulus", "response"]

metadata, events, event_id = mne.epochs.make_metadata(
    events=all_events,
    event_id=all_event_id,
    tmin=metadata_tmin,
    tmax=metadata_tmax,
    sfreq=raw.info["sfreq"],
    row_events=row_events,
    keep_last=keep_last,
)

# %%
metadata.loc[metadata["last_stimulus"] == "", :]

The cells with missing values are populated with empty strings. But there should be n/a in these instead.

@larsoner Any idea about this? It worked back with MNE-Python 1.6, and was broken in our rendered docs starting with 1.7. I'm not sure if it's anything we have changed, or if it's related to Pandas.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 4, 2024

Also, didn't we agree to keep the actual authors in all tutorials and examples? Currently, this document just contains

# Authors: The MNE-Python contributors.

@hoechenberger
Copy link
Member

I just tried with MNE 1.6 and I get the expected result. Switching to main while keeping all other installed packages the same yields the issue observed above. So it's likely a bug in MNE.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 4, 2024

Just to clarify the MWE, the output works for current versions (after the change), because it searches for metadata["last_stimulus"] == "", whereas MNE does metadata["last_stimulus"].isna().

@larsoner
Copy link
Member

larsoner commented Nov 4, 2024

Also, didn't we agree to keep the actual authors in all tutorials and examples? Currently, this document just contains

I think we decided that people can opt-in to adding their names in tutorials and examples if they are so inclined. Sounds like the author(s) of that tutorial haven't done that yet!

@hoechenberger hoechenberger marked this pull request as draft November 4, 2024 21:41
Comment on lines +1 to +4
Fix a bug in :func:`mne.epochs.make_metadata`, where missing values in the columns
generated for ``keep_first`` and ``keep_last`` events were depicted by empty strings,
while it should have been ``NA`` values. This issue existed since MNE-Python 1.7,
by `Richard Höchenberger`_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention that "This issue existed since MNE-Python 1.7". Also, "depicted" → "represented".

ax[0].xaxis.set_visible(False)
ax[1].xaxis.set_visible(False)

fig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not show the figure using plt.show()? Depending on where people execute this code, just typing fig might not actually show the figure.


# %%
# Aside from the fact that the data for the (much fewer) slow responses looks
# noisier – which is entirely to be expected – not much of an ERP difference
# noisier – which is entirely to be expected – not much of an ERP difference
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# noisier – which is entirely to be expected – not much of an ERP difference
# noisier – which is entirely to be expected – not much of an ERP difference

@@ -396,7 +426,7 @@
# period close to the response event should not be used for baseline
# correction. But at the same time, we don't want to use a baseline
# period that extends too far away from the button event. The following values
# seem to work quite well.
# seem to work quite well. Remember: Time point zero is the response event.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# seem to work quite well. Remember: Time point zero is the response event.
# seem to work quite well. Remember: time point zero is the response event.

@@ -3232,7 +3232,7 @@ def _diff_input_strings_vs_event_id(input_strings, input_name, event_id):

# keep_first and keep_last names
start_idx = stop_idx
metadata[columns[start_idx:]] = ""
metadata[columns[start_idx:]] = None
Copy link
Contributor Author

@cbrnr cbrnr Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas has many ways to represent missing data; this choice uses None, whereas other columns use NaN. Although pandas correctly treats all of these values as missing, we could take advantage of nullable extension data types, which add proper support for missing values, most notably to create various nullable integer types (Int8, Int16, ..., UInt8, UInt16, ...) and a string type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants