Skip to content

yogacha/Non-parametric-Decision-Tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Non-parametric Regression Tree

All objective functions based on absolute value (square, absolute, Huber... ) have implicit parametric distribution assumption in disguise (Gaussian, Laplace...), thus fail badly when assumption not satisfied.

To have a robust (outliers-free) regression tree, we should use objective function only depends on relative value.

Background

Let's say we have a decision stump on the dth dimension that splits data points into reds(xi=1) and blues(xi=0).

To find the best split, first define a objective function depends only on relative values of data points.

Non-parametric approach

Kendall's Tau, also known as bubble sort distance. Defined as follow:

When i, j have same color, goes to zero, thus it's equivalent to:

There is a trade-off between balance & difference.

In practice, we can use other impurities (entropy...) & differences (log odds...) as long as .

Note that:

(Gini impurity)

(conditional probability, draw one blue & one red without replacement)

Problem Statement

where is kendall rank correlation coefficient shown before.

and

Compare with SSE

where r is pearson correlation coefficient.

Algorithm

Calculate following for every possible stump,

  1. Initialize:

    Set all points to blue , tau = 0, maximum = 0
    Sort samples by Xd as a queue

  2. get next item i from queue, turn it red.

  3. Calculate tau, record stump if |tau| > maximum

  4. repeat 2. until all points are red

In short:

Dynamic Programing

With precompute:

  • sort every dimension

Time complexity

Then for every depth:

  • calculate cummulate sum for each leaf

Time complexity

Further

Just like what elastic net does, we can consider both abosulte & relative value by weighted sum.

Ex: Objective function =

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages