-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook 01: pd.get_dummies() resulting in True/False values instead of 1/0 - Causing issues with creating model #559
Comments
What's the error that you get in creating the model? I believe that Python implements bool as a subclass to integer and should you, for example, use a Normalization layer and use your insurance_one_hot it will be [0,1] as output. This example shows the integer subclass And then applying normalization will just use the bool and give you a [0,1] float32 back. |
Facing same issue |
Hi @ralversity , @cwestergren and @uKnowKlaus , There has been an update to You can get the behaviour of the first screenshot by setting For example: import pandas as pd
df = pd.DataFrame({'A': ['a', 'b', 'a'],
'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
df_one_hot = pd.get_dummies(df, dtype=bool) # bool is default
df_one_hot Output:
Change to import pandas as pd
df = pd.DataFrame({'A': ['a', 'b', 'a'],
'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
df_one_hot = pd.get_dummies(df, dtype=int)
df_one_hot Output:
See the docs here: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html |
Hey @mrdbourke, |
Do you get an error when applying normalisation though? It's still a subclass of Integers, as seen at https://docs.python.org/3/c-api/bool.html See my previous reply. |
@cwestergren I did use normalization as well but didn't work. IDK what's the issue with get_dummies. |
Understood. If you want to share your code here please do, but label encoding would work too. |
Thanks. I'm after the point of error. It will still be a bool type, but internally it's Can you share the error you get? |
Sorry, I didn't save the errors. I moved on with LabelEncoding so.. |
All good, happy coding :) |
Hey @uKnowKlaus I had the same issue but then I tried with 'int64' instead of 'int' and it worked! |
Thx everyone, I had this issue too |
@samuelperezh Hi, would you mind sharing the code you used with 'int64' ? |
np.int64 and 'run all cell' it worked for me |
I had the same issue and even after adding dtype=int however after adding df = df.astype(int) it worked perfectly well, |
Just use the inbuilt dtype method along with pd.get_dummies() like: It works perfectly fine. |
Not sure if I may have just done something wrong here, or if something has changed. But I noticed that when going through this I was having trouble creating the model. I discovered that the reason is that when I did this part:
It resulted in this:
I wound up changing the function to this and it fixed it for me, although not sure if this was the right thing to do or not:
The text was updated successfully, but these errors were encountered: