Model Showing Extra Class Label #102

laneciar · 2022-04-23T02:14:03Z

Hello,

When training my model for some reason it is coming up with an additional class. For example I currently have the following as classes [1, 2, 3, 4, 5] but when analyzing the tree using plot model it shows the following:

What is class 0 and why does this show up? Obviously 0% of the dataset has it but why is it here in the first place, i also believe it shows up when outputting the summary of the model. Even when turing my y label array to a set it gives the following:

Which comes from this snippet:

       temp = list(zip(self.x_train, self.y_train))
        random.shuffle(temp)
        x_train, y_train = zip(*temp)
        my_set = set(y_train)
        print(my_set)
        train_data = self.random_forest.make_tf_dataset(
            np.array(x_train), np.array(y_train)
        )

        # print(len(list(train_data.as_numpy_iterator())))
        self.model_6.fit(train_data, verbose=1)

Any idea whats going on?

The text was updated successfully, but these errors were encountered:

Cheril311 · 2022-04-25T06:35:46Z

@laneciar can you please tell me what your classes are and what loss function and metric are you using?

laneciar · 2022-04-26T04:19:16Z

@Cheril311 My classes are what is in the picture above, {1, 2, 3, 4, 5} are each associated with an x row, as for loss function and metric its just the default that the Random Forest model uses, I don't specify one.

laneciar · 2022-04-26T04:32:21Z

@Cheril311 Here is some source code:

Random Forest, i use the rf_model for training and the second returned model for evaluating and predicting.

ef create_single_model(self):
        input_features = tf.keras.Input(shape=(self.num_features,))

        # preprocessor = tf.keras.layers.Dense(self.num_features, activation=tf.nn.relu6)
        # preprocess_features = preprocessor(input_features)

        rf_model_1 = tfdf.keras.RandomForestModel(
            verbose=1,
            task=tfdf.keras.Task.CLASSIFICATION,
            num_trees=self.num_of_trees,
            max_depth=32,
            # hyperparameter_template="benchmark_rank1@v1",
            # bootstrap_size_ratio=1.0,  # Optimal at 1  0.6470000147819519
            categorical_algorithm="CART",  # CART and RANDOM provide same accuracy 0.6470000147819519
            growing_strategy="LOCAL",  # LOCAL signficiantly better 0.6470000147819519
            # honest=False,  # honest True is slightly better 0.6470000147819519
            # max_depth=5,  # Caps at 32 slightly better  0.6480000019073486
            # min_examples=5,  # Best at 5  0.6480000019073486
            # missing_value_policy="LOCAL_IMPUTATION",  # No change .6480000019073486
            sorting_strategy="PRESORT",  # No change .6480000019073486
            sparse_oblique_normalization="MIN_MAX",  # Signficiantly helps 0.6850000023841858
            # sparse_oblique_num_projections_exponent=2.0,  # Crashes when above 2
            # sparse_oblique_weights="BINARY",  # Slightly better
            split_axis="SPARSE_OBLIQUE",  # Slightly better
            # winner_take_all=True,  # Slightly better
        )
        out = rf_model_1(input_features)

        model = tf.keras.models.Model(input_features, out)

        return rf_model_1, model

Training: I shuffle the x and y data so the labels are mixed up and not in order

tf.keras.utils.plot_model(
            self.single_model,
            to_file="./info/single_arch/model_test.png",
            show_shapes=True,
            show_layer_names=True,
        )

        temp = list(zip(self.x_train, self.y_train))
        random.shuffle(temp)
        x_train, y_train = zip(*temp)
        train_data = self.random_forest.make_tf_dataset(
            np.array(x_train), np.array(y_train)
        )

        # print(len(list(train_data.as_numpy_iterator())))
        self.model_6.fit(train_data, verbose=1)

        self.model_6.compile(["accuracy"])
        validation_data = self.random_forest.make_tf_dataset(self.x_test, self.y_test)
        evaluation_df6_only = self.model_6.evaluate(validation_data, return_dict=True)

        with open("./info/single_arch/model_6.html", "w") as f:
            f.write(
                tfdf.model_plotter.plot_model(self.model_6, tree_idx=0, max_depth=10)
            )
        print("Accuracy (D6 only): ", evaluation_df6_only["accuracy"])

Hope this helps, let me know if you want anything else.

achoum · 2022-06-22T06:38:01Z

Hi Lanceciar,

This class 0 is an artifact of the way classes are handled internally. This class 0 represents the out-of-vocabulary values. However, since out-of-vocabulary values are not permitted for labels, it is always 0.

Thanks for the heads-up. We will resolve it :).

rstz added the bug Something isn't working label Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Showing Extra Class Label #102

Model Showing Extra Class Label #102

laneciar commented Apr 23, 2022

Cheril311 commented Apr 25, 2022

laneciar commented Apr 26, 2022

laneciar commented Apr 26, 2022

achoum commented Jun 22, 2022

Model Showing Extra Class Label #102

Model Showing Extra Class Label #102

Comments

laneciar commented Apr 23, 2022

Cheril311 commented Apr 25, 2022

laneciar commented Apr 26, 2022

laneciar commented Apr 26, 2022

achoum commented Jun 22, 2022