You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Either way, yes---using a subsample is probably the way to go.
I agree. I think one idea was to kind of motivate why we sometimes need to opt for a hashing vectorizer and/or out-of-core learning algorithm when it doesn't fit into memory. However, having a smaller subsample would be fine (after shuffling).
Coincidentally, I've used the dataset in my book as well :P And yeah, people were complaining that it takes too long (~5-10 minutes) and when they choose a subsample, the performance was really bad -- or in other words, people want the best of both worlds some times ... However, for the tutorial, I agree that having a subsample would be really necessary to keep on schedule ;)
Per the TODO file. Maybe @amueller can elaborate on this issue.
The text was updated successfully, but these errors were encountered: