Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitosheet default df renderer #1330

Open
wants to merge 25 commits into
base: jupyterlab-4-manually
Choose a base branch
from

Conversation

aarondr77
Copy link
Member

@aarondr77 aarondr77 commented Sep 11, 2024

Description

Makes Mito the default dataframe renderer in Jupyter. When a dataframe is hanging at the end of the a code cell, it is automatically displayed in a mitosheet instead of the static, pandas view that only shows a subset of the data.

Testing

Use the Test Notebook.ipynb to try various approaches. Some things to look for:

  1. This implementation should never overwrite existing code in the notebook
  2. If the mitosheet is unable to render, it should fallback to the default pandas dataframe viewer.

In addition, see the new frontend tests that ensure this behavior.

Documentation

yes, we need to update the mito docs.

Copy link

vercel bot commented Sep 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
monorepo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 7:33pm

@aarondr77 aarondr77 changed the base branch from dev to jupyterlab-4-manually September 11, 2024 21:44
mitosheet/src/DataFrameMimeRenderer.tsx Outdated Show resolved Hide resolved
mitosheet/src/DataFrameMimeRenderer.tsx Outdated Show resolved Hide resolved
let dataframeVariableName = undefined;
if (activeCellIndex) {
const previousCell = getCellAtIndex(cells, activeCellIndex - 1)
dataframeVariableName = getLastNonEmptyLine(getCellText(previousCell))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test more cases: things like having two dataframes as the output, non dataframes as the output, etc.

Add UI tests for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for:

  • Tuple of dataframes
  • Series
  • plotly graph

mitosheet/src/DataFrameMimeRenderer.tsx Outdated Show resolved Hide resolved
mitosheet/src/DataFrameMimeRenderer.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
@aarondr77
Copy link
Member Author

aarondr77 commented Sep 12, 2024

Make code gen work. Need to get the correct dataframe name.

@aarondr77
Copy link
Member Author

aarondr77 commented Sep 12, 2024

Using Enter to submit column renames does not work until you click somewhere else in the widget.

This might just be on Chrome btw. But we need to fix it since Chrome is most popular

@aarondr77
Copy link
Member Author

Don't register analysis.

@aarondr77
Copy link
Member Author

Don't have email popup

@aarondr77
Copy link
Member Author

Question: Each time we add a mitosheet.sheet() call the notebook size grows by about 2MB, but adding Mito as a default dataframe output only increases the size of the saved notebook by 2KB.

I think this is because the mime renderer doesn't actually get saved to the notebook (at least right now). But if I were to look at the size of the notebook while being used it would still grow 2MB per mito spreadsheet default dataframe ouput

@aarondr77
Copy link
Member Author

aarondr77 commented Oct 1, 2024

Approaches tried using the mime render, documenting for future reference

The challenging part is figuring out which dataframe to display in the mitosheet. To figure this out, we need to find the code cell that triggered this dataframe render and get the dataframe on its last line.

Finding the code cell is challenging however. Below describes a few options we tried and why they don't work.

  1. Using the activeCellIndex: We cannot use the activeCellIndex to identify the cell ID because when running a bunch of cells in a row (for example, using run all cells) or when the code cell takes a few seconds to execute, the active cell in the notebook tracker updates before we're able to save it. As a result, we end up thinking the code cell that triggered the dataframe render is at the bottom of the notebook.

  2. Using the execution count: We cannot use the execution count because the execution count will not update until the mime render is created. When we run the code cell df, that cell is responsible for creating the mime renderer. As a result, when we search the cells for the execution count, of ie: 3, the closest execution count that we get is 2.

  3. Use the dom to find the corresponding code cell ID. This works as follows: 1) Render the default renderer so that the we have a DOM element to start with. 2) Traverse up to find the Code Cell that triggered the dataframe render (the first code cell we find) 3) Get the code cell ID from the code cell's model. 4) Use the cell ID to find the input cell and read the dataframe name from it.

However, this didn't work for the reasons below.

Why using Mime Renderes did not work

There is still a race condition bug where if code cell 1 creates a dataframe renderer, and code cell 2 edits the dataframe, the mitosheet output will show the dataframe state after code cell 2 has run, instead of the state of the dataframe at code cell 1.

This occurs because in order to create the mitosheet, we need to execute the mitosheet.sheet() function. To do so, we had to send a new kernel message from the mimerender with the code mitosheet.sheet(df). However, becasue the kernel message queue might have had additional messages already queued that edited the df, by the time the mitosheet was rendered, it might have displayed a dataframe that reflected future code cell edits instead of the current state of the dataframe at the time the code cell with the hanging df was executed. This is not what we want.

Copy link
Member Author

@aarondr77 aarondr77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another review

mitosheet/Test Notebook.ipynb Show resolved Hide resolved
mitosheet/src/jupyter/extensionUtils.tsx Outdated Show resolved Hide resolved
mitosheet/src/jupyter/extensionUtils.tsx Outdated Show resolved Hide resolved
mitosheet/src/jupyter/extensionUtils.tsx Outdated Show resolved Hide resolved
mitosheet/src/jupyter/extensionUtils.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
let codeCell = getCellAtIndex(cells, mimeRenderInputCellIndex + 1)
const codeCellText = getCellText(codeCell);

if (codeCell === undefined) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn this logic into a helper function and share it with the other write generated code

mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
mitosheet/src/plugin.tsx Outdated Show resolved Hide resolved
@aarondr77
Copy link
Member Author

Make sure print(df) still returns the correct result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant