Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 2: Creating a test set, Stratify #689

Open
hady42 opened this issue Jun 6, 2024 · 0 comments
Open

Chapter 2: Creating a test set, Stratify #689

hady42 opened this issue Jun 6, 2024 · 0 comments

Comments

@hady42
Copy link

hady42 commented Jun 6, 2024

I am kindly asking for clarification in some points regarding Chapter 2.

  1. Why do we need to introduce the random seed? And if it is to have consistent train/test sets over multiple runs, then why do we need to have multiple runs.

  2. If using the hash function will keep the test set consistent, can new instances be included into the test set as the hash value of its id satisfies the condition crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32?

  3. What is the point to use stratified sampling in the first place.

  4. Why cant we just use the normal train_test_split method instead of StratifiedShuffleSplit?

Thank you for your kindness and your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant