How do I make an accurate prediction on new data when the target variable is missing? #18

scoroman · 2024-07-06T20:00:19Z

Hi, I get a shaping error when trying to make predictions on new and unseen data without the target feature variable that the model(s) were trained on, so I use placeholder values for the target variable as substitute for the missing data. however, when I use placeholders like np.zeros, previous values, averages etc. my prediction error goes from <1% to at least over 8% :(

# Old data
X_train, X_test, y_train, y_test = train_test_split(X, y)  

def tell_me_about_the_Data(**kwargs):
  .....

tell_me_about_the_data(X_train, X_test, y_train, y_test)
# these datasets are np.arrays of shape (1999, 42). here we will be training with 42 features

# import, train, fine tune and fit the model(s)
   ........

# New data
tell_me_about_the_data(new_data)
# this dataset is an np.array of shape (30, 41). that is only 41 features while your best models were trained and fit on 42 features. you will have a shaping error if you try to make one forward pass prediction on a dataset with an unknown or missing target variable

make_predictions = model.predict(new_data)
ValueError or whatever error corresponds to shaping error: shape (x, 41) but the model expected shape (x, 42)


# Using placeholders for the target_variable feature to fix the shape error creates poor predictions and reduces accuracy by >= 8%
new_data['target_variables'] = np.zeros # or average of old_data or some other filler
make_predictions = model.predict(new_data)
# MSE = 25%

last_known_target_features = old_data['target_variable'].tail(30)
new_data['target_variable'] = last_known_target_features
make_predictions = model.predict(new_data)
# MSE = 8%

# Original models MSE for generalized testing on the held out y_test set is < 1%. I want close to the <1% error I originally trained and tested on

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I make an accurate prediction on new data when the target variable is missing? #18

How do I make an accurate prediction on new data when the target variable is missing? #18

scoroman commented Jul 6, 2024

How do I make an accurate prediction on new data when the target variable is missing? #18

How do I make an accurate prediction on new data when the target variable is missing? #18

Comments

scoroman commented Jul 6, 2024