Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum features used for Random forest #11

Open
dhwani2410 opened this issue Sep 15, 2020 · 3 comments
Open

Maximum features used for Random forest #11

dhwani2410 opened this issue Sep 15, 2020 · 3 comments

Comments

@dhwani2410
Copy link

I have a matrix of 4000*1400, can I use it for classification problems using random forest?

@xuyxu
Copy link

xuyxu commented Sep 16, 2020

Given the size of the matrix, I suppose that the number of samples is 4000, while the number of features is 1400. Such size is still small, and can be easily solved using implementations of RF, such as sklearn.randomforestclassifier.

@dhwani2410
Copy link
Author

@AaronX121 thanks a lot for your reply, also can you suggest how to overcome the class imbalance in such cases.

@xuyxu
Copy link

xuyxu commented Sep 16, 2020

sklearn.randomforestclassifier can naturally handle class imbalance problems by passing the argument class_weight (e.g., put large weights on classes with very few samples). If this approach does not meet your requirements, I suggest to address the problem from the outside through over-sampling / down-sampling on the original dataset. This paper can be helpful: ''Exploratory undersampling for class-imbalance learning''.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants