Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Add datasets section to website #720

Closed

Conversation

jovan-stojanovic
Copy link
Member

Fix #621

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

doc/datasets.rst Outdated Show resolved Hide resolved
doc/datasets.rst Outdated Show resolved Hide resolved
doc/datasets.rst Outdated Show resolved Hide resolved
doc/datasets.rst Outdated Show resolved Hide resolved
doc/datasets.rst Outdated

It consists of generated, embeddings and real world data.

Real world datasets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like we are duplicating the API page. I would rather that we point to it rather than establish a second list, as this list will be hard to keep up to date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to link the dataset and the API page, so now the API content is generated from the datasets.rst file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do that? It adds complexity to the API page.

I would rather we don't change the API page, and just do a very small page on the datasets in the narrative docs that points to the API page. The datasets are not a core user-facing aspect of skrub. They are there to enable us to do good examples. They don't need to be prominent in the doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to list all functions in both the API and datasets file while not duplicating them explicitly.

I understand the complexity it adds and that duplicating is not the best option.
Here is a very simple alternative version where I only provide a link to the API page.

@@ -24,6 +24,6 @@ typically with `scikit-learn <http://scikit-learn.org>`_ with its
assembling
encoding
cleaning
datasets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that we are dedicating a top-level header to something which is actually a fairly minor feature of skrub.

I think that it is fine for now, but in the long run, as we add other somewhat "misc" sections, we should fold this section into a subsection.

doc/datasets.rst Outdated Show resolved Hide resolved
Co-authored-by: Vincent M <[email protected]>
@jovan-stojanovic
Copy link
Member Author

This was resolved in the meantime by another PR, closing this one..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create "datasets" section in user guide
4 participants