Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Skip the Data Cleaning by using Artifacts #271

Open
ParadaCarleton opened this issue Aug 20, 2021 · 4 comments
Open

Suggestion: Skip the Data Cleaning by using Artifacts #271

ParadaCarleton opened this issue Aug 20, 2021 · 4 comments
Assignees

Comments

@ParadaCarleton
Copy link
Member

The big block of data-cleaning operations at the start of the tutorials kind of breaks up the flow, and IMO makes the tutorials seem more confusing. Maybe we should pre-clean these datasets, then include these cleaned datasets as (lazily installed, since most users won't want them) artifacts in Turing, which would let us skip the cleaning steps?

@cpfiffer
Copy link
Member

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

@ParadaCarleton
Copy link
Member Author

ParadaCarleton commented Aug 21, 2021

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

I don't disagree that it's an important part of the workflow, just that I think it's probably best to have tutorials for cleaning data separate from tutorials for things like, say, Gaussian processes. We can include links in the introduction to tutorials on things like MLDataUtils and DrWatson. Ideally, every tutorial should focus on one topic, and do it well, so that users can find tutorials that quickly go over what they don't know, instead of mixing it with subjects they've already learned. For instance, the Stan manual rarely includes data cleaning; they're usually narrowly focused on a single specific topic. We can include a link to another tutorial at the top of the introduction. As for artifacts, I don't believe loading them should be especially difficult -- from the user end, the code should just look something like:

using Pkg.Artifacts
dataset_path = artifact"dataset"

@JasonPekos
Copy link
Member

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

@yebai
Copy link
Member

yebai commented May 25, 2024

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

@shravanngoswamii, can you give this suggestion a try, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants