-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation of cal_weighted_quantiles #62
Comments
@felix7602 No problem, thanks for your continued interest in and feedback on the package! Here's an example implementation of a custom predict function that I believe accomplishes what you want. It produces output identical to the model import numpy as np
from quantile_forest import RandomForestQuantileRegressor
from quantile_forest._quantile_forest_fast import calc_weighted_quantile
from sklearn import datasets
X, y = datasets.fetch_california_housing(return_X_y=True)
X = X[:100]
y = y[:100]
quantiles = [0.025, 0.5, 0.975]
interpolation = "linear"
model = RandomForestQuantileRegressor(max_samples_leaf=None, random_state=0)
model.fit(X, y)
y_pred = model.predict(
X,
quantiles=quantiles,
interpolation=interpolation,
weighted_quantile=True,
weighted_leaves=True,
aggregate_leaves_first=True,
)
def custom_predict(X, quantiles, model):
y_train = np.asarray(model.forest_.y_train)
y_train_leaves = np.asarray(model.forest_.y_train_leaves)
X_leaves = model.apply(X)
n_quantiles = len(quantiles)
n_samples = X_leaves.shape[0]
n_trees = X_leaves.shape[1]
n_outputs = len(y_train)
n_train = len(y_train[0])
max_idx = y_train_leaves.shape[3]
preds = np.full((n_samples, n_outputs, n_quantiles), np.nan, dtype=np.float64)
for i in range(n_samples):
n_leaf_samples = np.empty(n_trees)
n_total_samples = 0
n_total_trees = 0
for j in range(n_trees):
n_leaf_samples[j] = 0
for k in range(max_idx):
if y_train_leaves[j, X_leaves[i, j], 0, k] != 0:
n_leaf_samples[j] += 1
n_total_samples += n_leaf_samples[j]
n_total_trees += 1
for j in range(n_outputs):
train_indices = []
train_weights = []
# Accumulate training indices across leaves for each tree.
for k in range(n_trees):
train_indices.extend(y_train_leaves[k, X_leaves[i, k], j, :])
for k in range(n_trees):
train_weight = 0
if n_leaf_samples[k] > 0:
train_weight = 1 / n_leaf_samples[k]
train_weight *= n_total_samples
train_weight /= n_total_trees
train_weights.extend([train_weight] * max_idx)
# Reset leaf weights for all training indices to 0.
leaf_weights = np.zeros(n_train)
# Sum the weights/counts for each training index.
for l in range(len(train_indices)):
train_idx = train_indices[l]
train_wgt = train_weights[l]
if train_idx != 0:
leaf_weights[train_idx - 1] += train_wgt
# Calculate quantiles.
pred = calc_weighted_quantile(
y_train[j],
leaf_weights,
quantiles,
interpolation.encode(),
issorted=True,
)
preds[i, j, :] = pred
if preds.shape[2] == 1:
preds = np.squeeze(preds, axis=2)
if preds.shape[1] == 1:
preds = np.squeeze(preds, axis=1)
return preds
print(np.all(y_pred == custom_predict(X, quantiles, model))) |
Dear @reidjohnson, I am writing to express my heartfelt gratitude for your invaluable assistance in understanding the internal workings of the quantile regression forest model. As a student, your prompt and insightful responses have been instrumental in advancing my research. Your willingness to share your expertise and the time you have dedicated to addressing my questions, often with remarkable promptness, have significantly contributed to my comprehension and progress. I am genuinely grateful for your generosity and support. I want you to know that your help means a lot to me. It truly touches my heart. Thank you once again for your kindness and timely guidance. Sincerely, |
Thank you for the note, very glad to be of help! |
Hi @reidjohnson , thank you for patiently answering my questions multiple times.
For the model initialization, I set max_samples_leaf=None and in the predict function,
set weighted_quantile=True, weighted_leaves=True, aggregate_leaves_first=True
I looked up all the training samples in the corresponding leaf nodes (below):
and got the weight for each value (below):
From the perspective of model implementation, could you please tell me how to obtain the correct value with the interpolation method set to 'linear' based on the sorted data and weights above?
I have already reviewed your source code, but I did not understand it. I think the inputs and weights parameters received by the calc_weighted_quantile method in the source code might be different from what I have in mind (as shown in the second screenshot).
The text was updated successfully, but these errors were encountered: