-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model ensembling with Keras Funcational API #75
Comments
Hi, By default, a classification model outputs the probability of the positive class in the case of binary classification, and it outputs the probability of the individual classes in the case of multi class classification. Instead, if the model is created with the advanced argument's The model's output is (or at least should) be the same with Here is an illustration of the possible configurations: # A toy binary classification dataset.
binary_classification_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(
pd.DataFrame(
{"feature":[0,1,2,3]*5,
"label":[0,1,0,1]*5}),
label="label")
# A toy binary multi-class classification dataset.
multi_class_classification_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(
pd.DataFrame(
{"feature":[0,1,2,3]*5,
"label":[0,1,2,3]*5}),
label="label")
def first_dataset_batch_to_tensor(dataset):
return next(dataset.as_numpy_iterator())[0]
print("Output shapes:")
print("\tpredict_single_probability_for_binary_classification=True (default)")
model = tfdf.keras.GradientBoostedTreesModel(verbose=0)
model.fit(binary_classification_dataset)
print("\t\tPredict; Binary classification:", model.predict(binary_classification_dataset).shape)
print("\t\tCall; Binary classification:", model(first_dataset_batch_to_tensor(binary_classification_dataset)).shape)
model = tfdf.keras.GradientBoostedTreesModel(verbose=0)
model.fit(multi_class_classification_dataset)
print("\t\tPredict; Multi-class classification:", model.predict(multi_class_classification_dataset).shape)
print("\t\tCall; Multi-class classification:", model(first_dataset_batch_to_tensor(multi_class_classification_dataset)).shape)
print("\tpredict_single_probability_for_binary_classification=False")
adv_args = tfdf.keras.AdvancedArguments(
predict_single_probability_for_binary_classification=False
)
model = tfdf.keras.GradientBoostedTreesModel(verbose=0, advanced_arguments=adv_args)
model.fit(binary_classification_dataset)
print("\t\tPredict; Binary classification:", model.predict(binary_classification_dataset).shape)
print("\t\tCall; Binary classification; default:", model(first_dataset_batch_to_tensor(binary_classification_dataset)).shape)
model = tfdf.keras.GradientBoostedTreesModel(verbose=0, advanced_arguments=adv_args)
model.fit(multi_class_classification_dataset)
print("\t\tPredict; Multi-class classification:", model.predict(multi_class_classification_dataset).shape)
print("\t\tCall; Multi-class classification; default:", model(first_dataset_batch_to_tensor(multi_class_classification_dataset)).shape) will output:
|
Thanks for the help, I see that it's possible to get multiclass probability outputs but I'm not too sure yet on how to modify this example to make it work. I am going off of this (with modification to be a multiclass problem): I assumed I could do the exact same as this example in terms of general program flow....but I may be wrong.
The above is enough to show the issue I was facing initially. I only changed the dataset creation to be multi class, and changed the output hidden units for the two NN models to be num_classes. You'll get the error:
The warning says you need to fit the DF model before trying to use the output because the output size may change. So, I moved it to after the fitting of the model(s) like below, because I'm not sure what else to do....:
The above will fit the NN's and the RF's, but will error out when trying to stack the outputs....
Is there something wrong when how I'm trying to adjust this example to multi class? |
I'm still interested in this solution. Anyone have an idea on what is wrong here? |
It seems this is still an issue and there is not a clear way to combine multi-class RF with multi-class neural network. The suggested:
doesnt work since it only converts a binary output from 1 column to two column and does nothing for many classes. There is the |
Hi, I have the same problem. Could you find a solution? |
There isn't a solution yet? |
I'm have some issues trying to ensemble a neural network with a random forest.
The example I am following is very much like this, but only one NN and one RF.
https://www.tensorflow.org/decision_forests/tutorials/model_composition_colab
I have some common preprocessing layers for each model and am able to successfully train the NN component and the RF component.
My problem is a multiclass classification problem. When I call model.predict(X) on the RF model, I am returned for each example a distribution of num_classes values, the same as what my NN model returns.
The problem is when I simply call the model via functional API, ex. model(input_tensors), I am returned as output something of size (batch, 1) instead of (batch, num_classes).
I want to piece the two models together like shown in the link above, but I cannot call tf.stack on tensors that are not the same size.
How do I get the call method of the RF model to return class distribution and not just a single value?
The text was updated successfully, but these errors were encountered: