A quick (maybe naive) question about feature importance. #783
-
Hello, Thanks for sharing this awesome work! I have a question regarding how to "read" the results from the feature importance (by permutation) for a classification task. According to the documentation: "The permutation feature or step importance is defined as the decrease in a model score when a single feature or step value is randomly shuffled. So if you using accuracy (higher is better), the most important features or steps will be those with a lower value on the chart (as randomly shuffling them reduces performance)." I'm using accuracy as the metric. Thereby, if the values of the features are smaller on the graph, these features are more important for the model. For instance, var_4, var_6, var_7, var_23, var28, and var35 might be very relevant to the model due to the small values. Is that right? However, what happens when we have negative/red values? I have attached a figure of where this happens. Thank you so much for your help and awesome work! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
I might be wrong, but the way I see it is that large green bars represent important features (because removing them caused a large accuracy decrease) and red bars represent misleading features (removing them made the accuracy be higher) |
Beta Was this translation helpful? Give feedback.
-
Hi @jorgpg5, Victor’s answer is correct. Features in red or close to 0 are good candidates to be removed. Yo can drop then and retrain the model again. |
Beta Was this translation helpful? Give feedback.
I might be wrong, but the way I see it is that large green bars represent important features (because removing them caused a large accuracy decrease) and red bars represent misleading features (removing them made the accuracy be higher)