Added Kaggle example - Beginner classification #128

vanshhhhh · 2022-09-02T18:58:35Z

Kaggle dataset used - Titanic - Machine Learning from Disaster
Category - Beginner
Type - Classification

Note -

In the link section (image attached below), the links given will work only when this PR merges..
Alternatively, you can check the same example on my personal repository in which the "Run in Google Colab" link is working. I'd suggest viewing this example on Colab since some cell outputs are handled by limiting the cell size.

Mentor - @mirhyman
Reviewers - @random-forests, @markmcd

review-notebook-app · 2022-09-02T18:58:40Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

markmcd · 2022-09-06T05:46:43Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Word the title such that it explains what is in this doc, e.g. what task is being done (style guide).

The title will be used to list this doc in site navigation, so it needs to be descriptive and something a user wants to click on.

Reply via ReviewNB

markmcd · 2022-09-06T05:46:43Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Remove this cell. You introduce the dataset further down, no need to do it twice.

Reply via ReviewNB

markmcd · 2022-09-06T05:46:44Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


For a beginner's guide to TensorFlow Decision Forests, please refer to this tutorial.

Is this tutorial a beginner's guide? If so, there's no need to link somewhere else. If not, then we can remove a bunch of introductory concepts from this guide and focus it more specifically. (Alternatively, reword this link such that it explains the the difference - e.g. "For an introduction to [TFDF] without Kaggle, please refer to this tutorial"

Reply via ReviewNB

markmcd · 2022-09-06T05:46:44Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


before you begin experimenting with neural networks.

Try not to down-play DFs. They are a production-ready option that provide a number of benefits over DNN-style architectures.

Perhaps here just write "...will often outperform neural networks."

Also, I think the first paragraph under "random forest" repeats much of the introduction paragraph. Can you remove the "Random Forest" heading and flow this section on from the introduction? I think we can keep the intro short & clear.

Reply via ReviewNB

markmcd · 2022-09-06T05:46:44Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Try not to recommend libraries that are "competition". If there's a good reason to recommend SKL or one of the others, do so with context (e.g. "SKL supports $x algo, go check it out if you are interested.")

I think the second para can go too. We don't talk about how old things are (documentation should be timeless), and we've already introduced TFDF.

Reply via ReviewNB

markmcd · 2022-09-06T05:46:46Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


One of the nice features about this particular hyperparameter is that larger values are usually better, and come with little risk aside from slowing down training.

Can you provide a citation for this?

Reply via ReviewNB

markmcd · 2022-09-06T05:46:46Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Line #3. plt.plot([log.num_trees for log in logs], [log.evaluation.accuracy for log in logs])

You have stated that this is OOB / test data, but the only dataset I can see used so far is train_ds (from model.fit(x=train_ds)). Is this test data or training data?

Reply via ReviewNB

markmcd · 2022-09-06T05:46:46Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Similarly, is this evaluation data or training data?

If it's from training data, we need to communicate that. If it's from validation data, what's the difference between this and the next cell?

Reply via ReviewNB

markmcd · 2022-09-06T05:46:46Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Our docs need to be structured with a single H1 at the top - the title. Other headings start from H2.

Also I think it'd be clearer to have this section be "Use your model to make predictions". Using the test set makes sense, but pedagogically the user really just needs to know: 1) load data, 2) train model, 3) understand model fitness and now 4) use the model.

Reply via ReviewNB

markmcd · 2022-09-06T05:46:46Z

documentation/tutorials/kaggle_beginner_example_classification.ipynb

@@ -0,0 +1,2105 @@
+{


Notebooks need to run start-to-finish. If you want a user to do some work with incomplete or un-runnable code, consider using a code block in a markdown cell.

Some other options:
Rewrite this to show the user each step
Provide a short list of ideas (e.g. "For further exercise, try creating a model to do ...")

Keep in mind that these notebooks are published to tensorflow.org in a read-only format, so it probably doesn't make sense to say "Try it yourself, ... your code here ... etc" when there's no way a user can interact.

Reply via ReviewNB

Apologies for this, this one was due to my own suggestion basing things off of some colabs I had seen previously!

I think I wasn't considering the fact, when supplying feedback to Vansh, that the notebooks would be published in a read-only format not similar to what is viewable on colab. Apologies for that confusion I caused myself!

No problem at all - if you want to make an interactive notebook instead, that's totally fine. We'll just need to exclude it from publication on tensorflow.org.

I do recommend getting it into a format we can use on the site though, you'll get a lot more visibility there.

markmcd · 2022-09-06T05:47:11Z

Thanks for the guide - looks like a good start. I've made some comments that apply in several places, please be sure to fix them everywhere (e.g. style related comments).

Can you also ensure that you run nbfmt and nblint on your notebook to ensure stable diffs and that the right links/licenses/etc are present.

Added Kaggle example - Beginner classification

d9c35de

Minor change

b10b8f0

rstz requested a review from random-forests September 5, 2022 14:00

markmcd reviewed Sep 6, 2022

View reviewed changes

vanshhhhh added 2 commits September 16, 2022 01:19

Update kaggle_beginner_example_classification.ipynb

ec26f12

Minor Changes

d2b2433

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Kaggle example - Beginner classification #128

Added Kaggle example - Beginner classification #128

vanshhhhh commented Sep 2, 2022 •

edited

Loading

review-notebook-app bot commented Sep 2, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

markmcd Sep 6, 2022

mirhyman Sep 9, 2022

markmcd Sep 9, 2022

markmcd commented Sep 6, 2022

Added Kaggle example - Beginner classification #128

Are you sure you want to change the base?

Added Kaggle example - Beginner classification #128

Conversation

vanshhhhh commented Sep 2, 2022 • edited Loading

review-notebook-app bot commented Sep 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markmcd commented Sep 6, 2022

vanshhhhh commented Sep 2, 2022 •

edited

Loading