implementation of cal_weighted_quantiles #62

felix7602 · 2024-07-02T09:17:53Z

Hi @reidjohnson , thank you for patiently answering my questions multiple times.

For the model initialization, I set max_samples_leaf=None and in the predict function,
set weighted_quantile=True, weighted_leaves=True, aggregate_leaves_first=True

I looked up all the training samples in the corresponding leaf nodes (below):

and got the weight for each value (below):

From the perspective of model implementation, could you please tell me how to obtain the correct value with the interpolation method set to 'linear' based on the sorted data and weights above?

I have already reviewed your source code, but I did not understand it. I think the inputs and weights parameters received by the calc_weighted_quantile method in the source code might be different from what I have in mind (as shown in the second screenshot).

felix7602 · 2024-07-02T10:15:32Z

This is a paragraph in your introduction file, and the process I mentioned above is to replicate this procedure.

reidjohnson · 2024-07-03T03:00:20Z

@felix7602 No problem, thanks for your continued interest in and feedback on the package!

Here's an example implementation of a custom predict function that I believe accomplishes what you want. It produces output identical to the model predict method with the parameters you specified and uses the calc_weighted_quantile function with the expected inputs and weights. Feel free to follow up with any further questions.

import numpy as np
from quantile_forest import RandomForestQuantileRegressor
from quantile_forest._quantile_forest_fast import calc_weighted_quantile
from sklearn import datasets

X, y = datasets.fetch_california_housing(return_X_y=True)
X = X[:100]
y = y[:100]

quantiles = [0.025, 0.5, 0.975]
interpolation = "linear"

model = RandomForestQuantileRegressor(max_samples_leaf=None, random_state=0)
model.fit(X, y)
y_pred = model.predict(
    X,
    quantiles=quantiles,
    interpolation=interpolation,
    weighted_quantile=True,
    weighted_leaves=True,
    aggregate_leaves_first=True,
)


def custom_predict(X, quantiles, model):
    y_train = np.asarray(model.forest_.y_train)
    y_train_leaves = np.asarray(model.forest_.y_train_leaves)

    X_leaves = model.apply(X)

    n_quantiles = len(quantiles)
    n_samples = X_leaves.shape[0]
    n_trees = X_leaves.shape[1]

    n_outputs = len(y_train)
    n_train = len(y_train[0])
    max_idx = y_train_leaves.shape[3]

    preds = np.full((n_samples, n_outputs, n_quantiles), np.nan, dtype=np.float64)

    for i in range(n_samples):
        n_leaf_samples = np.empty(n_trees)

        n_total_samples = 0
        n_total_trees = 0
        for j in range(n_trees):
            n_leaf_samples[j] = 0
            for k in range(max_idx):
                if y_train_leaves[j, X_leaves[i, j], 0, k] != 0:
                    n_leaf_samples[j] += 1
            n_total_samples += n_leaf_samples[j]
            n_total_trees += 1

        for j in range(n_outputs):
            train_indices = []
            train_weights = []

            # Accumulate training indices across leaves for each tree.
            for k in range(n_trees):
                train_indices.extend(y_train_leaves[k, X_leaves[i, k], j, :])

            for k in range(n_trees):
                train_weight = 0
                if n_leaf_samples[k] > 0:
                    train_weight = 1 / n_leaf_samples[k]
                    train_weight *= n_total_samples
                    train_weight /= n_total_trees
                train_weights.extend([train_weight] * max_idx)

            # Reset leaf weights for all training indices to 0.
            leaf_weights = np.zeros(n_train)

            # Sum the weights/counts for each training index.
            for l in range(len(train_indices)):
                train_idx = train_indices[l]
                train_wgt = train_weights[l]
                if train_idx != 0:
                    leaf_weights[train_idx - 1] += train_wgt

            # Calculate quantiles.
            pred = calc_weighted_quantile(
                y_train[j],
                leaf_weights,
                quantiles,
                interpolation.encode(),
                issorted=True,
            )

            preds[i, j, :] = pred

    if preds.shape[2] == 1:
        preds = np.squeeze(preds, axis=2)

    if preds.shape[1] == 1:
        preds = np.squeeze(preds, axis=1)

    return preds


print(np.all(y_pred == custom_predict(X, quantiles, model)))

felix7602 · 2024-07-05T01:44:35Z

Dear @reidjohnson,

I am writing to express my heartfelt gratitude for your invaluable assistance in understanding the internal workings of the quantile regression forest model. As a student, your prompt and insightful responses have been instrumental in advancing my research.

Your willingness to share your expertise and the time you have dedicated to addressing my questions, often with remarkable promptness, have significantly contributed to my comprehension and progress. I am genuinely grateful for your generosity and support.

I want you to know that your help means a lot to me. It truly touches my heart. Thank you once again for your kindness and timely guidance.

Sincerely,
Felix

reidjohnson · 2024-07-16T03:10:34Z

Thank you for the note, very glad to be of help!

reidjohnson closed this as completed Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation of cal_weighted_quantiles #62

implementation of cal_weighted_quantiles #62

felix7602 commented Jul 2, 2024

felix7602 commented Jul 2, 2024

reidjohnson commented Jul 3, 2024

felix7602 commented Jul 5, 2024

reidjohnson commented Jul 16, 2024

implementation of cal_weighted_quantiles #62

implementation of cal_weighted_quantiles #62

Comments

felix7602 commented Jul 2, 2024

felix7602 commented Jul 2, 2024

reidjohnson commented Jul 3, 2024

felix7602 commented Jul 5, 2024

reidjohnson commented Jul 16, 2024