Suggestion: Skip the Data Cleaning by using Artifacts #271

ParadaCarleton · 2021-08-20T22:47:07Z

The big block of data-cleaning operations at the start of the tutorials kind of breaks up the flow, and IMO makes the tutorials seem more confusing. Maybe we should pre-clean these datasets, then include these cleaned datasets as (lazily installed, since most users won't want them) artifacts in Turing, which would let us skip the cleaning steps?

cpfiffer · 2021-08-20T23:09:58Z

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

ParadaCarleton · 2021-08-21T02:05:47Z

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

I don't disagree that it's an important part of the workflow, just that I think it's probably best to have tutorials for cleaning data separate from tutorials for things like, say, Gaussian processes. We can include links in the introduction to tutorials on things like MLDataUtils and DrWatson. Ideally, every tutorial should focus on one topic, and do it well, so that users can find tutorials that quickly go over what they don't know, instead of mixing it with subjects they've already learned. For instance, the Stan manual rarely includes data cleaning; they're usually narrowly focused on a single specific topic. We can include a link to another tutorial at the top of the introduction. As for artifacts, I don't believe loading them should be especially difficult -- from the user end, the code should just look something like:

using Pkg.Artifacts
dataset_path = artifact"dataset"

JasonPekos · 2024-04-06T03:31:18Z

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

yebai · 2024-05-25T10:10:21Z

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

@shravanngoswamii, can you give this suggestion a try, too?

yebai assigned shravanngoswamii May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Skip the Data Cleaning by using Artifacts #271

Suggestion: Skip the Data Cleaning by using Artifacts #271

ParadaCarleton commented Aug 20, 2021

cpfiffer commented Aug 20, 2021

ParadaCarleton commented Aug 21, 2021 •

edited

Loading

JasonPekos commented Apr 6, 2024

yebai commented May 25, 2024 •

edited

Loading

Suggestion: Skip the Data Cleaning by using Artifacts #271

Suggestion: Skip the Data Cleaning by using Artifacts #271

Comments

ParadaCarleton commented Aug 20, 2021

cpfiffer commented Aug 20, 2021

ParadaCarleton commented Aug 21, 2021 • edited Loading

JasonPekos commented Apr 6, 2024

yebai commented May 25, 2024 • edited Loading

ParadaCarleton commented Aug 21, 2021 •

edited

Loading

yebai commented May 25, 2024 •

edited

Loading