-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Kaggle example - Beginner classification #128
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word the title such that it explains what is in this doc, e.g. what task is being done (style guide).
The title will be used to list this doc in site navigation, so it needs to be descriptive and something a user wants to click on.
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this cell. You introduce the dataset further down, no need to do it twice.
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a beginner's guide to TensorFlow Decision Forests, please refer to this tutorial.
Is this tutorial a beginner's guide? If so, there's no need to link somewhere else. If not, then we can remove a bunch of introductory concepts from this guide and focus it more specifically. (Alternatively, reword this link such that it explains the the difference - e.g. "For an introduction to [TFDF] without Kaggle, please refer to this tutorial"
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before you begin experimenting with neural networks.
Try not to down-play DFs. They are a production-ready option that provide a number of benefits over DNN-style architectures.
Perhaps here just write "...will often outperform neural networks."
Also, I think the first paragraph under "random forest" repeats much of the introduction paragraph. Can you remove the "Random Forest" heading and flow this section on from the introduction? I think we can keep the intro short & clear.
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try not to recommend libraries that are "competition". If there's a good reason to recommend SKL or one of the others, do so with context (e.g. "SKL supports $x algo, go check it out if you are interested.")
I think the second para can go too. We don't talk about how old things are (documentation should be timeless), and we've already introduced TFDF.
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the nice features about this particular hyperparameter is that larger values are usually better, and come with little risk aside from slowing down training.
Can you provide a citation for this?
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #3. plt.plot([log.num_trees for log in logs], [log.evaluation.accuracy for log in logs])
You have stated that this is OOB / test data, but the only dataset I can see used so far is train_ds (from model.fit(x=train_ds)
). Is this test data or training data?
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, is this evaluation data or training data?
If it's from training data, we need to communicate that. If it's from validation data, what's the difference between this and the next cell?
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docs need to be structured with a single H1 at the top - the title. Other headings start from H2.
Also I think it'd be clearer to have this section be "Use your model to make predictions". Using the test set makes sense, but pedagogically the user really just needs to know: 1) load data, 2) train model, 3) understand model fitness and now 4) use the model.
Reply via ReviewNB
@@ -0,0 +1,2105 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notebooks need to run start-to-finish. If you want a user to do some work with incomplete or un-runnable code, consider using a code block in a markdown cell.
Some other options:
- Rewrite this to show the user each step
- Provide a short list of ideas (e.g. "For further exercise, try creating a model to do ...")
Keep in mind that these notebooks are published to tensorflow.org in a read-only format, so it probably doesn't make sense to say "Try it yourself, ... your code here ... etc" when there's no way a user can interact.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for this, this one was due to my own suggestion basing things off of some colabs I had seen previously!
I think I wasn't considering the fact, when supplying feedback to Vansh, that the notebooks would be published in a read-only format not similar to what is viewable on colab. Apologies for that confusion I caused myself!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all - if you want to make an interactive notebook instead, that's totally fine. We'll just need to exclude it from publication on tensorflow.org.
I do recommend getting it into a format we can use on the site though, you'll get a lot more visibility there.
Thanks for the guide - looks like a good start. I've made some comments that apply in several places, please be sure to fix them everywhere (e.g. style related comments).
Can you also ensure that you run |
Kaggle dataset used - Titanic - Machine Learning from Disaster
Category - Beginner
Type - Classification
Note -
Mentor - @mirhyman
Reviewers - @random-forests, @markmcd