You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am kindly asking for clarification in some points regarding Chapter 2.
Why do we need to introduce the random seed? And if it is to have consistent train/test sets over multiple runs, then why do we need to have multiple runs.
If using the hash function will keep the test set consistent, can new instances be included into the test set as the hash value of its id satisfies the condition crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32?
What is the point to use stratified sampling in the first place.
Why cant we just use the normal train_test_split method instead of StratifiedShuffleSplit?
Thank you for your kindness and your time.
The text was updated successfully, but these errors were encountered:
I am kindly asking for clarification in some points regarding Chapter 2.
Why do we need to introduce the random seed? And if it is to have consistent train/test sets over multiple runs, then why do we need to have multiple runs.
If using the hash function will keep the test set consistent, can new instances be included into the test set as the hash value of its id satisfies the condition crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32?
What is the point to use stratified sampling in the first place.
Why cant we just use the normal train_test_split method instead of StratifiedShuffleSplit?
Thank you for your kindness and your time.
The text was updated successfully, but these errors were encountered: