Skip to content

Commit

Permalink
Prepare release of TF-DF 1.11.0
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 690589733
  • Loading branch information
rstz authored and copybara-github committed Oct 28, 2024
1 parent c3f2df3 commit d1d2f4e
Show file tree
Hide file tree
Showing 8 changed files with 67 additions and 37 deletions.
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# Changelog

## HEAD
## 1.11.0 - 2024-10-28

### Feature

- Renamed LAMBDA_MART_NDCG5 loss to LAMBDA_MART_NDCG. The old loss is still
available. The ndcg truncation can now be modified via a hyperparameter.
- Notify users about ydf during startup. This message can be disabled by
setting Environment variable TFDF_DISABLE_WELCOME_MESSAGE.

### Fix

- Some errors are now InvalidArgumentError instead of UnknownError.
- Fix compatibility with TF 2.18.0.

## 1.10.0 - 2024-08-21

Expand Down
33 changes: 20 additions & 13 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,28 @@ http_archive(
# absl used by tensorflow.
http_archive(
name = "org_tensorflow",
strip_prefix = "tensorflow-2.17.0",
sha256 = "9cc4d5773b8ee910079baaecb4086d0c28939f024dd74b33fc5e64779b6533dc",
urls = ["https://github.com/tensorflow/tensorflow/archive/v2.17.0.tar.gz"],
sha256 = "d7876f4bb0235cac60eb6316392a7c48676729860da1ab659fb440379ad5186d",
strip_prefix = "tensorflow-2.18.0",
urls = ["https://github.com/tensorflow/tensorflow/archive/v2.18.0.tar.gz"],
)


load("//tensorflow_decision_forests:tensorflow_decision_forests.bzl", "py_deps_profile")

py_deps_profile(
name = "release_or_nightly",
requirements_in = "//configure:requirements.in",
pip_repo_name = "pypi",
deps_map = {
"tensorflow": ["tf-nightly", "tf_header_lib", "libtensorflow_framework"],
"tf-keras": ["tf-keras-nightly"]
"tensorflow": [
"tf-nightly",
"tf_header_lib",
"libtensorflow_framework",
],
"tf-keras": ["tf-keras-nightly"],
},
pip_repo_name = "pypi",
requirements_in = "//configure:requirements.in",
switch = {
"IS_NIGHTLY": "nightly"
}
"IS_NIGHTLY": "nightly",
},
)

# Initialize hermetic Python
Expand All @@ -49,12 +52,12 @@ python_init_rules()
load("@org_tensorflow//third_party/py:python_init_repositories.bzl", "python_init_repositories")

python_init_repositories(
default_python_version = "system",
requirements = {
"3.9": "//configure:requirements_lock_3_9.txt",
"3.10": "//configure:requirements_lock_3_10.txt",
"3.11": "//configure:requirements_lock_3_11.txt",
},
default_python_version = "system",
)

load("@org_tensorflow//third_party/py:python_init_toolchains.bzl", "python_init_toolchains")
Expand Down Expand Up @@ -140,16 +143,20 @@ nccl_configure(name = "local_config_nccl")
# ========================================

# Third party libraries
load("//third_party/absl_py:workspace.bzl", absl_py = "deps")
load("//third_party/absl:workspace.bzl", absl = "deps")
load("//third_party/absl_py:workspace.bzl", absl_py = "deps")
load("//third_party/benchmark:workspace.bzl", benchmark = "deps")
load("//third_party/gtest:workspace.bzl", gtest = "deps")
load("//third_party/protobuf:workspace.bzl", protobuf = "deps")

absl()

absl_py()

benchmark()

gtest()

protobuf()

# Yggdrasil Decision Forests
Expand All @@ -170,7 +177,7 @@ ydf_load_deps(
"pybind11",
"pybind11_abseil",
"pybind11_protobuf",
"tensorflow"
"tensorflow",
],
repo_name = "@ydf",
)
4 changes: 2 additions & 2 deletions configure/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@
from setuptools.command.install import install
from setuptools.dist import Distribution

_VERSION = "1.10.0"
_VERSION = "1.11.0"

with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()

REQUIRED_PACKAGES = [
"numpy",
"pandas",
"tensorflow==2.17.0",
"tensorflow==2.18.0",
"six",
"absl_py",
"wheel",
Expand Down
15 changes: 10 additions & 5 deletions documentation/known_issues.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# Known Issues

The underlying engine behind the decision forests algorithms used by TensorFlow
Decision Forests have been extensively production-tested. This file lists some
of the known issues.
## Prefer YDF for new projects

See also the [migration guide](migration.md) for behavior that is different from
other algorithms.
[YDF](https://github.com/google/yggdrasil-decision-forests) is Google's new
library to train Decision Forests.

YDF extends the power of TF-DF, offering new features, a simplified API, faster
training times, updated documentation, and enhanced compatibility with popular
ML libraries.

Some of the issues mentioned below are fixed in YDF.

## Windows Pip package is not available

Expand Down Expand Up @@ -54,6 +58,7 @@ The following table shows the compatibility between

tensorflow_decision_forests | tensorflow
--------------------------- | ---------------
1.11.0 | 2.18.0
1.10.0 | 2.17.0
1.9.2 | 2.16.2
1.9.1 | 2.16.1
Expand Down
4 changes: 2 additions & 2 deletions tensorflow_decision_forests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@
```
"""

__version__ = "1.10.0"
__version__ = "1.11.0"
__author__ = "Mathieu Guillame-Bert"

compatible_tf_versions = ["2.17.0"]
compatible_tf_versions = ["2.18.0"]
__git_version__ = "HEAD" # Modify for release build.

from tensorflow_decision_forests.tensorflow import check_version
Expand Down
32 changes: 21 additions & 11 deletions tensorflow_decision_forests/keras/wrappers_pre_generated.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ class CartModel(core.CoreModel):
split_axis: What structure of split to consider for numerical features. -
`AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This
is the "classical" way to train a tree. Default value. - `SPARSE_OBLIQUE`:
Sparse oblique splits (i.e. random splits one a small number of features)
Sparse oblique splits (i.e. random splits on a small number of features)
from "Sparse Projection Oblique Random Forests", Tomita et al., 2020. -
`MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from
"Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes
Expand Down Expand Up @@ -1030,6 +1030,9 @@ class GradientBoostedTreesModel(core.CoreModel):
variable importance of the model at the end of the training using the
validation dataset. Enabling this feature can increase the training time
significantly. Default: False.
cross_entropy_ndcg_truncation: Truncation of the cross-entropy NDCG loss
(default 5). Only used with cross-entropy NDCG loss i.e.
`loss="XE_NDCG_MART"` Default: 5.
dart_dropout: Dropout rate applied when using the DART i.e. when
forest_extraction=DART. Default: None.
early_stopping: Early stopping detects the overfitting of the model and
Expand All @@ -1048,12 +1051,12 @@ class GradientBoostedTreesModel(core.CoreModel):
Default: 10.
early_stopping_num_trees_look_ahead: Rolling number of trees used to detect
validation loss increase and trigger early stopping. Default: 30.
focal_loss_alpha: EXPERIMENTAL. Weighting parameter for focal loss, positive
samples weighted by alpha, negative samples by (1-alpha). The default 0.5
value means no active class-level weighting. Only used with focal loss
i.e. `loss="BINARY_FOCAL_LOSS"` Default: 0.5.
focal_loss_gamma: EXPERIMENTAL. Exponent of the misprediction exponent term
in focal loss, corresponds to gamma parameter in
focal_loss_alpha: EXPERIMENTAL, default 0.5. Weighting parameter for focal
loss, positive samples weighted by alpha, negative samples by (1-alpha).
The default 0.5 value means no active class-level weighting. Only used
with focal loss i.e. `loss="BINARY_FOCAL_LOSS"` Default: 0.5.
focal_loss_gamma: EXPERIMENTAL, default 2.0. Exponent of the misprediction
exponent term in focal loss, corresponds to gamma parameter in
https://arxiv.org/pdf/1708.02002.pdf. Only used with focal loss i.e.
`loss="BINARY_FOCAL_LOSS"` Default: 2.0.
forest_extraction: How to construct the forest: - MART: For Multiple
Expand Down Expand Up @@ -1122,12 +1125,13 @@ class GradientBoostedTreesModel(core.CoreModel):
likelihood loss. Mainly used for counting problems. Only valid for
regression. - `MULTINOMIAL_LOG_LIKELIHOOD`: Multinomial log likelihood
i.e. cross-entropy. Only valid for binary or multi-class classification. -
`LAMBDA_MART_NDCG5`: LambdaMART with NDCG5. - `XE_NDCG_MART`: Cross
`LAMBDA_MART_NDCG`: LambdaMART with NDCG@5. - `XE_NDCG_MART`: Cross
Entropy Loss NDCG. See arxiv.org/abs/1911.09798. - `BINARY_FOCAL_LOSS`:
Focal loss. Only valid for binary classification. See
https://arxiv.org/pdf/1708.02002.pdf. - `POISSON`: Poisson log likelihood.
Only valid for regression. - `MEAN_AVERAGE_ERROR`: Mean average error
a.k.a. MAE.
a.k.a. MAE. - `LAMBDA_MART_NDCG5`: DEPRECATED, use LAMBDA_MART_NDCG.
LambdaMART with NDCG@5.
Default: "DEFAULT".
max_depth: Maximum depth of the tree. `max_depth=1` means that all trees
will be roots. `max_depth=-1` means that tree depth is not restricted by
Expand Down Expand Up @@ -1170,6 +1174,8 @@ class GradientBoostedTreesModel(core.CoreModel):
et al. in "Random Survival Forests"
(https://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908043).
Default: "GLOBAL_IMPUTATION".
ndcg_truncation: Truncation of the NDCG loss (default 5). Only used with
NDCG loss i.e. `loss="LAMBDA_MART_NDCG". ` Default: 5.
num_candidate_attributes: Number of unique valid attributes tested for each
node. An attribute is valid if it has at least a valid split. If
`num_candidate_attributes=0`, the value is set to the classical default
Expand Down Expand Up @@ -1266,7 +1272,7 @@ class GradientBoostedTreesModel(core.CoreModel):
split_axis: What structure of split to consider for numerical features. -
`AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This
is the "classical" way to train a tree. Default value. - `SPARSE_OBLIQUE`:
Sparse oblique splits (i.e. random splits one a small number of features)
Sparse oblique splits (i.e. random splits on a small number of features)
from "Sparse Projection Oblique Random Forests", Tomita et al., 2020. -
`MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from
"Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes
Expand Down Expand Up @@ -1336,6 +1342,7 @@ def __init__(
categorical_set_split_max_num_items: Optional[int] = -1,
categorical_set_split_min_item_frequency: Optional[int] = 1,
compute_permutation_variable_importance: Optional[bool] = False,
cross_entropy_ndcg_truncation: Optional[int] = 5,
dart_dropout: Optional[float] = None,
early_stopping: Optional[str] = "LOSS_INCREASE",
early_stopping_initial_iteration: Optional[int] = 10,
Expand Down Expand Up @@ -1364,6 +1371,7 @@ def __init__(
mhld_oblique_sample_attributes: Optional[bool] = None,
min_examples: Optional[int] = 5,
missing_value_policy: Optional[str] = "GLOBAL_IMPUTATION",
ndcg_truncation: Optional[int] = 5,
num_candidate_attributes: Optional[int] = -1,
num_candidate_attributes_ratio: Optional[float] = -1.0,
num_trees: Optional[int] = 300,
Expand Down Expand Up @@ -1407,6 +1415,7 @@ def __init__(
"compute_permutation_variable_importance": (
compute_permutation_variable_importance
),
"cross_entropy_ndcg_truncation": cross_entropy_ndcg_truncation,
"dart_dropout": dart_dropout,
"early_stopping": early_stopping,
"early_stopping_initial_iteration": early_stopping_initial_iteration,
Expand Down Expand Up @@ -1439,6 +1448,7 @@ def __init__(
"mhld_oblique_sample_attributes": mhld_oblique_sample_attributes,
"min_examples": min_examples,
"missing_value_policy": missing_value_policy,
"ndcg_truncation": ndcg_truncation,
"num_candidate_attributes": num_candidate_attributes,
"num_candidate_attributes_ratio": num_candidate_attributes_ratio,
"num_trees": num_trees,
Expand Down Expand Up @@ -2369,7 +2379,7 @@ class RandomForestModel(core.CoreModel):
split_axis: What structure of split to consider for numerical features. -
`AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This
is the "classical" way to train a tree. Default value. - `SPARSE_OBLIQUE`:
Sparse oblique splits (i.e. random splits one a small number of features)
Sparse oblique splits (i.e. random splits on a small number of features)
from "Sparse Projection Oblique Random Forests", Tomita et al., 2020. -
`MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from
"Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes
Expand Down
2 changes: 1 addition & 1 deletion tools/start_compile_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
# directory.
TFDF_DIRNAME=${PWD##*/}

DOCKER_IMAGE=tensorflow/build:2.17-python3.9
DOCKER_IMAGE=tensorflow/build:2.18-python3.9
DOCKER_CONTAINER=compile_tfdf

echo "Available containers:"
Expand Down
4 changes: 2 additions & 2 deletions tools/test_bazel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
#
# Usage example
#
# RUN_TESTS=1 PY_VERSION=3.9 TF_VERSION=2.16.2 ./tools/test_bazel.sh
# RUN_TESTS=1 PY_VERSION=3.9 TF_VERSION=2.18.0 ./tools/test_bazel.sh

set -vex

Expand Down Expand Up @@ -90,7 +90,7 @@ commit_slug=$(curl -s "https://api.github.com/repos/tensorflow/tensorflow/commit
# Update TF dependency to the chosen version
sed -E -i "s/strip_prefix = \"tensorflow-2\.[0-9]+(\.[0-9]+)*(-rc[0-9]+)?\",/strip_prefix = \"tensorflow-${commit_slug}\",/" WORKSPACE
sed -E -i "s|\"https://github.com/tensorflow/tensorflow/archive/v.+\.tar.gz\"|\"https://github.com/tensorflow/tensorflow/archive/${commit_slug}.tar.gz\"|" WORKSPACE
prev_shasum=$(grep -A 1 -e "strip_prefix.*tensorflow-" WORKSPACE | tail -1 | awk -F '"' '{print $2}')
prev_shasum=$(grep -B 1 -e "strip_prefix.*tensorflow-" WORKSPACE | head -1 | awk -F '"' '{print $2}')
sed -i "s/sha256 = \"${prev_shasum}\",//" WORKSPACE

# Get build configuration for chosen version.
Expand Down

0 comments on commit d1d2f4e

Please sign in to comment.