diff --git a/dev/composition/index.html b/dev/composition/index.html index 7f286f6f..2ea1ee94 100644 --- a/dev/composition/index.html +++ b/dev/composition/index.html @@ -1,2 +1,2 @@ -Composition · MLJBase.jl
+Composition · MLJBase.jl
diff --git a/dev/datasets/index.html b/dev/datasets/index.html index 98f62b4b..8f070587 100644 --- a/dev/datasets/index.html +++ b/dev/datasets/index.html @@ -4,8 +4,8 @@ categorical=true)

Load it with DelimitedFiles and Tables

data_raw, data_header = readdlm(fpath, ',', header=true)
 data_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))

Retrieve the conversions:

for (n, st) in zip(names(data), scitype_union.(eachcol(data)))
     println(":$n=>$st,")
-end

Copy and paste the result in a coerce

data_table = coerce(data_table, ...)
MLJBase.load_datasetMethod
load_dataset(fpath, coercions)

Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.

source
MLJBase.load_sunspotsMethod

Load a well-known sunspot time series (table with one column). https://www.sws.bom.gov.au/Educational/2/3/6

source
MLJBase.@load_amesMacro

Load the full version of the well-known Ames Housing task.

source
MLJBase.@load_bostonMacro

Load a well-known public regression dataset with Continuous features.

source
MLJBase.@load_crabsMacro

Load a well-known crab classification dataset with nominal features.

source
MLJBase.@load_irisMacro

Load a well-known public classification task with nominal features.

source
MLJBase.@load_reduced_amesMacro

Load a reduced version of the well-known Ames Housing task

source
MLJBase.@load_smarketMacro

Load S&P Stock Market dataset, as used in An Introduction to Statistical Learning with applications in R, by Witten et al (2013), Springer-Verlag, New York.

source
MLJBase.@load_sunspotsMacro

Load a well-known sunspot time series (single table with one column).

source

Synthetic datasets

MLJBase.augment_XMethod
augment_X(X, fit_intercept)

Given a matrix X, append a column of ones if fit_intercept is true. See make_regression.

source
MLJBase.finalize_XyMethod
finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)

Internal function to finalize the make_* functions.

source
MLJBase.make_blobsFunction
X, y = make_blobs(n=100, p=2; kwargs...)

Generate Gaussian blobs for clustering and classification problems.

Return value

By default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • centers=3: either a number of centers or a c x p matrix with c pre-determined centers,

  • cluster_std=1.0: the standard deviation(s) of each blob,

  • center_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
source
MLJBase.make_circlesFunction
X, y = make_circles(n=100; kwargs...)

Generate n labeled points close to two concentric circles for classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0: standard deviation of the Gaussian noise added to the data,

  • factor=0.8: ratio of the smaller radius over the larger one,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_circles(100; noise=0.5, factor=0.3)
source
MLJBase.make_moonsFunction
make_moons(n::Int=100; kwargs...)

Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0.1: standard deviation of the Gaussian noise added to the data,

  • xshift=1.0: horizontal translation of the second center with respect to the first one.

  • yshift=0.3: vertical translation of the second center with respect to the first one.

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_moons(100; noise=0.5)
source
MLJBase.make_regressionFunction
make_regression(n, p; kwargs...)

Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.

Return value

By default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.

Keywords

  • intercept=true: Whether to generate data from a model with intercept.

  • n_targets=1: Number of columns in the target.

  • sparse=0: Proportion of the generating weight vector that is sparse.

  • noise=0.1: Standard deviation of the Gaussian noise added to the response (target).

  • outliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)

  • as_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.

  • eltype=Float64: Element type for X and y. Must subtype AbstractFloat.

  • binary=false: Whether the target should be binarized (via a sigmoid).

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false).

Example

X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
source
MLJBase.outlify!Method
outlify!(rng, y, s)

Add outliers to portion s of vector y.

source
MLJBase.runif_abMethod
runif_ab(rng, n, p, a, b)

Internal function to generate n points in [a, b]ᵖ uniformly at random.

source
MLJBase.sigmoidMethod
sigmoid(x)

Return the sigmoid computed in a numerically stable way: $σ(x) = 1/(1+\exp(-x))$

source
MLJBase.sparsify!Method
sparsify!(rng, θ, s)

Make portion s of vector θ exactly 0.

source

Utility functions

MLJBase.complementMethod
complement(folds, i)

The complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.

complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
source
MLJBase.corestrictMethod
corestrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
-corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
source
MLJBase.partitionMethod
partition(X, fractions...;
+end

Copy and paste the result in a coerce

data_table = coerce(data_table, ...)
MLJBase.load_datasetMethod
load_dataset(fpath, coercions)

Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.

source

Synthetic datasets

MLJBase.finalize_XyMethod
finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)

Internal function to finalize the make_* functions.

source
MLJBase.make_blobsFunction
X, y = make_blobs(n=100, p=2; kwargs...)

Generate Gaussian blobs for clustering and classification problems.

Return value

By default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • centers=3: either a number of centers or a c x p matrix with c pre-determined centers,

  • cluster_std=1.0: the standard deviation(s) of each blob,

  • center_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
source
MLJBase.make_circlesFunction
X, y = make_circles(n=100; kwargs...)

Generate n labeled points close to two concentric circles for classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0: standard deviation of the Gaussian noise added to the data,

  • factor=0.8: ratio of the smaller radius over the larger one,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_circles(100; noise=0.5, factor=0.3)
source
MLJBase.make_moonsFunction
make_moons(n::Int=100; kwargs...)

Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0.1: standard deviation of the Gaussian noise added to the data,

  • xshift=1.0: horizontal translation of the second center with respect to the first one.

  • yshift=0.3: vertical translation of the second center with respect to the first one.

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_moons(100; noise=0.5)
source
MLJBase.make_regressionFunction
make_regression(n, p; kwargs...)

Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.

Return value

By default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.

Keywords

  • intercept=true: Whether to generate data from a model with intercept.

  • n_targets=1: Number of columns in the target.

  • sparse=0: Proportion of the generating weight vector that is sparse.

  • noise=0.1: Standard deviation of the Gaussian noise added to the response (target).

  • outliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)

  • as_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.

  • eltype=Float64: Element type for X and y. Must subtype AbstractFloat.

  • binary=false: Whether the target should be binarized (via a sigmoid).

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false).

Example

X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
source
MLJBase.runif_abMethod
runif_ab(rng, n, p, a, b)

Internal function to generate n points in [a, b]ᵖ uniformly at random.

source
MLJBase.sigmoidMethod
sigmoid(x)

Return the sigmoid computed in a numerically stable way: $σ(x) = 1/(1+\exp(-x))$

source

Utility functions

MLJBase.complementMethod
complement(folds, i)

The complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.

complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
source
MLJBase.corestrictMethod
corestrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
+corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
source
MLJBase.partitionMethod
partition(X, fractions...;
           shuffle=nothing,
           rng=Random.GLOBAL_RNG,
           stratify=nothing,
@@ -19,8 +19,8 @@
 ([1 6], [2 7; 3 8], [4 9; 5 10])
 
 julia> X, y = make_blobs() # a table and vector
-julia> Xtrain, Xtest = partition(X, 0.8, stratify=y)

Here's an example of synchronized partitioning of multiple objects:

julia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)

Keywords

  • shuffle=nothing: if set to true, shuffles the rows before taking fractions.

  • rng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.

  • stratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.

  • multi=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).

source
MLJBase.restrictMethod
restrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
-restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]

See also corestrict

source
MLJBase.skipinvalidMethod
skipinvalid(A, B)

For vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.

source
MLJBase.unpackMethod
unpack(table, f1, f2, ... fk;
+julia> Xtrain, Xtest = partition(X, 0.8, stratify=y)

Here's an example of synchronized partitioning of multiple objects:

julia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)

Keywords

  • shuffle=nothing: if set to true, shuffles the rows before taking fractions.

  • rng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.

  • stratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.

  • multi=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).

source
MLJBase.restrictMethod
restrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
+restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]

See also corestrict

source
MLJBase.skipinvalidMethod
skipinvalid(A, B)

For vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.

source
MLJBase.unpackMethod
unpack(table, f1, f2, ... fk;
        wrap_singles=false,
        shuffle=false,
        rng::Union{AbstractRNG,Int,Nothing}=nothing,
@@ -49,4 +49,4 @@
 julia> W  # the column(s) left over
 2-element Vector{String}:
  "A"
- "B"

Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.

If coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.

If shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.

source
+ "B"

Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.

If coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.

If shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.

source diff --git a/dev/distributions/index.html b/dev/distributions/index.html index 1ea246f7..3b05e8a9 100644 --- a/dev/distributions/index.html +++ b/dev/distributions/index.html @@ -26,6 +26,6 @@ [5.0, 5.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 221 [5.5, 6.0) ┤ 0 [6.0, 6.5) ┤▇▇▇▇▇▇▇▇▇▇▇ 89 - └ ┘source
MLJBase.iteratorMethod
iterator([rng, ], r::NominalRange, [,n])
-iterator([rng, ], r::NumericRange, n)

Return an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:

  1. First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:

    r.lowerr.upperLU
    finitefiniter.lowerr.upper
    -Inffiniter.upper - 2r.unitr.upper
    finiteInfr.lowerr.lower + 2r.unit
    -InfInfr.origin - r.unitr.origin + r.unit
  2. If a callable f is provided as scale, then a uniform spacing is always applied in (1) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)

  3. If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).

  4. Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.

source
MLJBase.scaleMethod
scale(r::ParamRange)

Return the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).

source
StatsAPI.fitMethod
Distributions.fit(D, r::MLJBase.NumericRange)

Fit and return a distribution d of type D to the one-dimensional range r.

Only types D in the table below are supported.

The distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.

Distribution type DCharacterization of d0
Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweightminimum(d) = r.lower, maximum(d) = r.upper
Normal, Gamma, InverseGaussian, Logistic, LogNormalmean(d) = r.origin, std(d) = r.unit
Cauchy, Gumbel, Laplace, (Normal)Dist.location(d) = r.origin, Dist.scale(d) = r.unit
PoissonDist.mean(d) = r.unit

Here Dist = Distributions.

source
Base.rangeMethod
r = range(model, :hyper; values=nothing)

Define a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.

r = range(model, :hyper; upper=nothing, lower=nothing,
-          scale=nothing, values=nothing)

Assuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.

Note that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.

By default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.

A nested hyperparameter is specified using dot notation (see above).

If scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.

If values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).

See also: iterator, sampler

source

Utility functions

+ └ ┘source
MLJBase.iteratorMethod
iterator([rng, ], r::NominalRange, [,n])
+iterator([rng, ], r::NumericRange, n)

Return an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:

  1. First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:

    r.lowerr.upperLU
    finitefiniter.lowerr.upper
    -Inffiniter.upper - 2r.unitr.upper
    finiteInfr.lowerr.lower + 2r.unit
    -InfInfr.origin - r.unitr.origin + r.unit
  2. If a callable f is provided as scale, then a uniform spacing is always applied in (1) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)

  3. If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).

  4. Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.

source
MLJBase.scaleMethod
scale(r::ParamRange)

Return the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).

source
StatsAPI.fitMethod
Distributions.fit(D, r::MLJBase.NumericRange)

Fit and return a distribution d of type D to the one-dimensional range r.

Only types D in the table below are supported.

The distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.

Distribution type DCharacterization of d0
Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweightminimum(d) = r.lower, maximum(d) = r.upper
Normal, Gamma, InverseGaussian, Logistic, LogNormalmean(d) = r.origin, std(d) = r.unit
Cauchy, Gumbel, Laplace, (Normal)Dist.location(d) = r.origin, Dist.scale(d) = r.unit
PoissonDist.mean(d) = r.unit

Here Dist = Distributions.

source
Base.rangeMethod
r = range(model, :hyper; values=nothing)

Define a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.

r = range(model, :hyper; upper=nothing, lower=nothing,
+          scale=nothing, values=nothing)

Assuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.

Note that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.

By default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.

A nested hyperparameter is specified using dot notation (see above).

If scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.

If values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).

See also: iterator, sampler

source

Utility functions

diff --git a/dev/index.html b/dev/index.html index 5b611a6f..92c2a600 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · MLJBase.jl

MLJBase.jl

These docs are bare-bones and auto-generated. Complete MLJ documentation is here.

For MLJBase-specific developer information, see also the README.md file.

+Home · MLJBase.jl

MLJBase.jl

These docs are bare-bones and auto-generated. Complete MLJ documentation is here.

For MLJBase-specific developer information, see also the README.md file.

diff --git a/dev/resampling/index.html b/dev/resampling/index.html index 2e72044a..8731b247 100644 --- a/dev/resampling/index.html +++ b/dev/resampling/index.html @@ -1,9 +1,9 @@ -Resampling · MLJBase.jl

Resampling

MLJBase.CVType
cv = CV(; nfolds=6,  shuffle=nothing, rng=nothing)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning.

train_test_pairs(cv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.CompactPerformanceEvaluationType
CompactPerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.

For more on the remaining fields, see PerformanceEvaluation.

source
MLJBase.HoldoutType
holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)

Instantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.

train_test_pairs(holdout, rows)

Returns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.

source
MLJBase.InSampleType
in_sample = InSample()

Instantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.

Example

using MLJBase, MLJModels
+Resampling · MLJBase.jl

Resampling

MLJBase.CVType
cv = CV(; nfolds=6,  shuffle=nothing, rng=nothing)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning.

train_test_pairs(cv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.CompactPerformanceEvaluationType
CompactPerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.

For more on the remaining fields, see PerformanceEvaluation.

source
MLJBase.HoldoutType
holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)

Instantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.

train_test_pairs(holdout, rows)

Returns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.

source
MLJBase.InSampleType
in_sample = InSample()

Instantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.

Example

using MLJBase, MLJModels
 
 X, y = make_blobs()  # a table and a vector
 model = ConstantClassifier()
-train, test = partition(eachindex(y), 0.7)  # train:test = 70:30

Compute in-sample (training) loss:

evaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)

Compute the out-of-sample loss:

evaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)

Or equivalently:

evaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)
source
MLJBase.PerformanceEvaluationType
PerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.

If evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.

When evaluate/evaluate! is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.

When displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.

  • resampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).

  • repeats: the number of times the resampling strategy was repeated.

See also CompactPerformanceEvaluation.

source
MLJBase.ResamplerType
resampler = Resampler(
+train, test = partition(eachindex(y), 0.7)  # train:test = 70:30

Compute in-sample (training) loss:

evaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)

Compute the out-of-sample loss:

evaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)

Or equivalently:

evaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)
source
MLJBase.PerformanceEvaluationType
PerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.

If evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.

When evaluate/evaluate! is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.

When displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.

  • resampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).

  • repeats: the number of times the resampling strategy was repeated.

See also CompactPerformanceEvaluation.

source
MLJBase.ResamplerType
resampler = Resampler(
     model=ConstantRegressor(),
     resampling=CV(),
     measure=nothing,
@@ -16,9 +16,9 @@
     per_observation=true,
     logger=default_logger(),
     compact=false,
-)

Private method. Use at own risk.

Resampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See evaluate! for meaning of the options. Not intended for use by general user, who will ordinarily use evaluate! directly.

Given a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).

On subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).

If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.

The sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

The sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

source
MLJBase.StratifiedCVType
stratified_cv = StratifiedCV(; nfolds=6,
+)

Private method. Use at own risk.

Resampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See evaluate! for meaning of the options. Not intended for use by general user, who will ordinarily use evaluate! directly.

Given a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).

On subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).

If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.

The sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

The sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

source
MLJBase.StratifiedCVType
stratified_cv = StratifiedCV(; nfolds=6,
                                shuffle=false,
-                               rng=Random.GLOBAL_RNG)

Stratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).

train_test_pairs(stratified_cv, rows, y)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.

Unlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.

The stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.TimeSeriesCVType
tscv = TimeSeriesCV(; nfolds=4)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.

train_test_pairs(tscv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.

The first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.

Examples

julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
+                               rng=Random.GLOBAL_RNG)

Stratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).

train_test_pairs(stratified_cv, rows, y)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.

Unlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.

The stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.TimeSeriesCVType
tscv = TimeSeriesCV(; nfolds=4)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.

train_test_pairs(tscv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.

The first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.

Examples

julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
 3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:
  (1:4, 5:6)
  (1:6, 7:8)
@@ -44,11 +44,11 @@
 _.per_observation = [missing]
 _.fitted_params_per_fold = [ … ]
 _.report_per_fold = [ … ]
-_.train_test_rows = [ … ]
source
MLJBase.default_loggerMethod
default_logger(logger)

Reset the default logger.

Example

Suppose an MLflow tracking service is running on a local server at http://127.0.0.1:500. Then in every evaluate call in which logger is not specified, the peformance evaluation is automatically logged to the service, as here:

using MLJ
+_.train_test_rows = [ … ]
source
MLJBase.default_loggerMethod
default_logger(logger)

Reset the default logger.

Example

Suppose an MLflow tracking service is running on a local server at http://127.0.0.1:500. Then in every evaluate call in which logger is not specified, the peformance evaluation is automatically logged to the service, as here:

using MLJ
 logger = MLJFlow.Logger("http://127.0.0.1:5000/api")
 default_logger(logger)
 
 X, y = make_moons()
 model = ConstantClassifier()
-evaluate(model, X, y, measures=[log_loss, accuracy)])
source
MLJBase.default_loggerMethod
default_logger()

Return the current value of the default logger for use with supported machine learning tracking platforms, such as MLflow.

The default logger is used in calls to evaluate! and evaluate, and in the constructors TunedModel and IteratedModel, unless the logger keyword is explicitly specified.

Note

Prior to MLJ v0.20.7 (and MLJBase 1.5) the default logger was always nothing.

When MLJBase is first loaded, the default logger is nothing.

source
MLJBase.evaluate!Method
evaluate!(mach; resampling=CV(), measure=nothing, options...)

Estimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.

Available resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting

resampling = [(1:100, 101:200),
-              (101:200, 1:100)]

gives two-fold cross-validation using the first 200 rows of data.

Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.

Although evaluate! is mutating, mach.model and mach.args are not mutated.

Additional keyword options

  • rows - vector of observation indices from which both train and test folds are constructed (default is all observations)

  • operation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.

  • weights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • class_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • repeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.

  • acceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.

  • force=false: set to true to force cold-restart of each training event

  • verbosity::Int=1 logging level; can be negative

  • check_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.

  • per_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.

  • logger=default_logger() - a logger object for forwarding results to a machine learning tracking platform; see default_logger for details.

  • compact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.

See also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.

source
MLJBase.log_evaluationMethod
log_evaluation(logger, performance_evaluation)

Log a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.

Implementations for new logging platforms

Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.

source
+evaluate(model, X, y, measures=[log_loss, accuracy)])
source
MLJBase.default_loggerMethod
default_logger()

Return the current value of the default logger for use with supported machine learning tracking platforms, such as MLflow.

The default logger is used in calls to evaluate! and evaluate, and in the constructors TunedModel and IteratedModel, unless the logger keyword is explicitly specified.

Note

Prior to MLJ v0.20.7 (and MLJBase 1.5) the default logger was always nothing.

When MLJBase is first loaded, the default logger is nothing.

source
MLJBase.evaluate!Method
evaluate!(mach; resampling=CV(), measure=nothing, options...)

Estimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.

Available resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting

resampling = [(1:100, 101:200),
+              (101:200, 1:100)]

gives two-fold cross-validation using the first 200 rows of data.

Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.

Although evaluate! is mutating, mach.model and mach.args are not mutated.

Additional keyword options

  • rows - vector of observation indices from which both train and test folds are constructed (default is all observations)

  • operation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.

  • weights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • class_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • repeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.

  • acceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.

  • force=false: set to true to force cold-restart of each training event

  • verbosity::Int=1 logging level; can be negative

  • check_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.

  • per_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.

  • logger=default_logger() - a logger object for forwarding results to a machine learning tracking platform; see default_logger for details.

  • compact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.

See also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.

source
MLJBase.log_evaluationMethod
log_evaluation(logger, performance_evaluation)

Log a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.

Implementations for new logging platforms

Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.

source
diff --git a/dev/search/index.html b/dev/search/index.html index 3ffbb6cd..56d8c9e7 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · MLJBase.jl

Loading search...

    +Search · MLJBase.jl

    Loading search...

      diff --git a/dev/utilities/index.html b/dev/utilities/index.html index 79847843..50a75b1f 100644 --- a/dev/utilities/index.html +++ b/dev/utilities/index.html @@ -1,11 +1,11 @@ -Utilities · MLJBase.jl

      Utilities

      Machines

      Base.replaceMethod
      replace(mach::Machine, field1 => value1, field2 => value2, ...)

      Private method.

      Return a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).

      The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):

      replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
      source
      MLJBase.ageMethod
      age(mach::Machine)

      Return an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.

      source
      MLJBase.ancestorsMethod
      ancestors(mach::Machine; self=false)

      All ancestors of mach, including mach if self=true.

      source
      MLJBase.default_scitype_check_levelFunction
      default_scitype_check_level()

      Return the current global default value for scientific type checking when constructing machines.

      default_scitype_check_level(i::Integer)

      Set the global default value for scientific type checking to i.

      The effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:

      scitype_check_levelInspect scitypes?If Unknown in scitypesIf other scitype mismatch
      0×
      1 (value at startup)warning
      2warningwarning
      3warningerror
      4errorerror

      See also machine

      source
      MLJBase.fit_only!Method
      MLJBase.fit_only!(
      +Utilities · MLJBase.jl

      Utilities

      Machines

      Base.replaceMethod
      replace(mach::Machine, field1 => value1, field2 => value2, ...)

      Private method.

      Return a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).

      The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):

      replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
      source
      MLJBase.ageMethod
      age(mach::Machine)

      Return an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.

      source
      MLJBase.ancestorsMethod
      ancestors(mach::Machine; self=false)

      All ancestors of mach, including mach if self=true.

      source
      MLJBase.default_scitype_check_levelFunction
      default_scitype_check_level()

      Return the current global default value for scientific type checking when constructing machines.

      default_scitype_check_level(i::Integer)

      Set the global default value for scientific type checking to i.

      The effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:

      scitype_check_levelInspect scitypes?If Unknown in scitypesIf other scitype mismatch
      0×
      1 (value at startup)warning
      2warningwarning
      3warningerror
      4errorerror

      See also machine

      source
      MLJBase.fit_only!Method
      MLJBase.fit_only!(
           mach::Machine;
           rows=nothing,
           verbosity=1,
           force=false,
           composite=nothing,
      -)

      Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:

      • Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.

      • Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.

      • No-operation. Leave existing learned parameters untouched. Do not increment mach.state.

      If the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.

      Training action logic

      For the action to be a no-operation, either mach.frozen == true or or none of the following apply:

      1. mach has never been trained (mach.state == 0).

      2. force == true.

      3. The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).

      4. The specified rows have changed since the last retraining and mach.model does not have Static type.

      5. mach.model is a model and different from the last model used for training, but has the same type.

      6. mach.model is a model but has a type different from the last model used for training.

      7. mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.

      8. mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.

      In any of the cases (1) - (4), (6), or (8), mach is trained ab initio. If (5) or (7) is true, then a training update is applied.

      To freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).

      Implementation details

      The data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.

      source
      MLJBase.freeze!Method
      freeze!(mach)

      Freeze the machine mach so that it will never be retrained (unless thawed).

      See also thaw!.

      source
      MLJBase.last_modelMethod
      last_model(mach::Machine)

      Return the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.

      Returns nothing if mach has not been trained.

      source
      MLJBase.machineFunction
      machine(model, args...; cache=true, scitype_check_level=1)

      Construct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:

      using MLJModels
      +)

      Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:

      • Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.

      • Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.

      • No-operation. Leave existing learned parameters untouched. Do not increment mach.state.

      If the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.

      Training action logic

      For the action to be a no-operation, either mach.frozen == true or or none of the following apply:

      1. mach has never been trained (mach.state == 0).

      2. force == true.

      3. The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).

      4. The specified rows have changed since the last retraining and mach.model does not have Static type.

      5. mach.model is a model and different from the last model used for training, but has the same type.

      6. mach.model is a model but has a type different from the last model used for training.

      7. mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.

      8. mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.

      In any of the cases (1) - (4), (6), or (8), mach is trained ab initio. If (5) or (7) is true, then a training update is applied.

      To freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).

      Implementation details

      The data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.

      source
      MLJBase.freeze!Method
      freeze!(mach)

      Freeze the machine mach so that it will never be retrained (unless thawed).

      See also thaw!.

      source
      MLJBase.last_modelMethod
      last_model(mach::Machine)

      Return the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.

      Returns nothing if mach has not been trained.

      source
      MLJBase.machineFunction
      machine(model, args...; cache=true, scitype_check_level=1)

      Construct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:

      using MLJModels
       X, y = make_regression()
       
       PCA = @load PCA pkg=MultivariateStats
      @@ -28,7 +28,7 @@
       X, y = make_blobs()
       mach = machine(:classifier, X, y)
       fit!(mach, composite=my_composite)

      The last two lines are equivalent to

      mach = machine(ConstantClassifier(), X, y)
      -fit!(mach)

      Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.

      See also fit!, default_scitype_check_level, MLJBase.save, serializable.

      source
      MLJBase.machineMethod
      machine(file::Union{String, IO})

      Rebuild from a file a machine that has been serialized using the default Serialization module.

      source
      MLJBase.reportMethod
      report(mach)

      Return the report for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
      +fit!(mach)

      Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.

      See also fit!, default_scitype_check_level, MLJBase.save, serializable.

      source
      MLJBase.machineMethod
      machine(file::Union{String, IO})

      Rebuild from a file a machine that has been serialized using the default Serialization module.

      source
      MLJBase.reportMethod
      report(mach)

      Return the report for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
       julia> @load LinearBinaryClassifier pkg=GLM
       julia> X, y = @load_crabs;
       julia> pipe = Standardizer() |> LinearBinaryClassifier();
      @@ -39,7 +39,7 @@
        dof_residual = 195.0,
        stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
        vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)
      -

      See also fitted_params

      source
      MLJBase.report_given_methodMethod
      report_given_method(mach::Machine)

      Same as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.

      A specialized method intended for learning network applications.

      The return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.

      source
      MLJBase.restore!Function
      restore!(mach::Machine)

      Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.

      For an example see serializable.

      source
      MLJBase.serializableMethod
      serializable(mach::Machine)

      Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.

      Any general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.

      If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.

      A machine returned by serializable is characterized by the property mach.state == -1.

      Example using JLSO

      using MLJ
      +

      See also fitted_params

      source
      MLJBase.report_given_methodMethod
      report_given_method(mach::Machine)

      Same as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.

      A specialized method intended for learning network applications.

      The return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.

      source
      MLJBase.restore!Function
      restore!(mach::Machine)

      Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.

      For an example see serializable.

      source
      MLJBase.serializableMethod
      serializable(mach::Machine)

      Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.

      Any general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.

      If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.

      A machine returned by serializable is characterized by the property mach.state == -1.

      Example using JLSO

      using MLJ
       using JLSO
       Tree = @load DecisionTreeClassifier
       tree = Tree()
      @@ -55,7 +55,7 @@
       restore!(loaded_mach)
       
       predict(loaded_mach, X)
      -predict(mach, X)

      See also restore!, MLJBase.save.

      source
      MLJModelInterface.fitted_paramsMethod
      fitted_params(mach)

      Return the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
      +predict(mach, X)

      See also restore!, MLJBase.save.

      source
      MLJModelInterface.fitted_paramsMethod
      fitted_params(mach)

      Return the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
       julia> @load LogisticClassifier pkg=MLJLinearModels
       julia> X, y = @load_crabs;
       julia> pipe = Standardizer() |> LogisticClassifier();
      @@ -64,8 +64,8 @@
       julia> fitted_params(mach).logistic_classifier
       (classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
        coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
      - intercept = 0.0883301599726305,)

      See also report

      source
      MLJModelInterface.saveMethod
      MLJ.save(mach)
      -MLJBase.save(mach)

      Save the current machine as an artifact at the location associated with default_logger](@ref).

      source
      MLJModelInterface.saveMethod
      MLJ.save(mach)
      +MLJBase.save(mach)

      Save the current machine as an artifact at the location associated with default_logger](@ref).

      source
      MLJModelInterface.saveMethod
      MLJ.save(filename, mach::Machine)
       MLJ.save(io, mach::Machine)
       
       MLJBase.save(filename, mach::Machine)
      @@ -83,12 +83,12 @@
       MLJ.save(io, mach)
       seekstart(io)
       predict_only_mach = machine(io)
      -predict(predict_only_mach, X)
      Only load files from trusted sources

      Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.

      See also serializable, machine.

      source
      StatsAPI.fit!Method
      fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)

      Fit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.

      To attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!

      source

      Parameter Inspection

      Show

      MLJBase._recursive_showMethod
      _recursive_show(stream, object, current_depth, depth)

      Private method.

      Generate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

      1. If f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).

      2. Otherwise f is displayed as "(omitted T)" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

      source
      MLJBase.color_offMethod
      color_off()

      Suppress color and bold output at the REPL for displaying MLJ objects.

      source
      MLJBase.color_onMethod
      color_on()

      Enable color and bold output at the REPL, for enhanced display of MLJ objects.

      source
      MLJBase.handleMethod
      handle(X)

      return abbreviated object id (as string) or it's registered handle (as string) if this exists

      source
      MLJBase.@constantMacro
      @constant x = value

      Private method (used in testing).

      Equivalent to const x = value but registers the binding thus:

      MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

      Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

      Warning

      As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

      source
      MLJBase.@moreMacro
      @more

      Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.

      source

      Utility functions

      MLJBase._permute_rowsMethod
      _permute_rows(obj, perm)

      Internal function to return a vector or matrix with permuted rows given the permutation perm.

      source
      MLJBase.available_nameMethod
      available_name(modl::Module, name::Symbol)

      Function to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.

      source
      MLJBase.check_same_nrowsMethod
      check_same_nrows(X, Y)

      Internal function to check two objects, each a vector or a matrix, have the same number of rows.

      source
      MLJBase.chunksMethod
      chunks(range, n)

      Split an AbstractRange into n subranges of approximately equal length.

      Example

      julia> collect(chunks(1:5, 2))
      +predict(predict_only_mach, X)
      Only load files from trusted sources

      Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.

      See also serializable, machine.

      source
      StatsAPI.fit!Method
      fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)

      Fit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.

      To attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!

      source

      Parameter Inspection

      Show

      MLJBase._recursive_showMethod
      _recursive_show(stream, object, current_depth, depth)

      Private method.

      Generate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

      1. If f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).

      2. Otherwise f is displayed as "(omitted T)" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

      source
      MLJBase.color_offMethod
      color_off()

      Suppress color and bold output at the REPL for displaying MLJ objects.

      source
      MLJBase.color_onMethod
      color_on()

      Enable color and bold output at the REPL, for enhanced display of MLJ objects.

      source
      MLJBase.handleMethod
      handle(X)

      return abbreviated object id (as string) or it's registered handle (as string) if this exists

      source
      MLJBase.@constantMacro
      @constant x = value

      Private method (used in testing).

      Equivalent to const x = value but registers the binding thus:

      MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

      Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

      Warning

      As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

      source
      MLJBase.@moreMacro
      @more

      Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.

      source

      Utility functions

      MLJBase._permute_rowsMethod
      _permute_rows(obj, perm)

      Internal function to return a vector or matrix with permuted rows given the permutation perm.

      source
      MLJBase.available_nameMethod
      available_name(modl::Module, name::Symbol)

      Function to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.

      source
      MLJBase.check_same_nrowsMethod
      check_same_nrows(X, Y)

      Internal function to check two objects, each a vector or a matrix, have the same number of rows.

      source
      MLJBase.chunksMethod
      chunks(range, n)

      Split an AbstractRange into n subranges of approximately equal length.

      Example

      julia> collect(chunks(1:5, 2))
       2-element Vector{UnitRange{Int64}}:
        1:3
      - 4:5

      Private method

      source
      MLJBase.flat_valuesMethod
      flat_values(t::NamedTuple)

      View a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.

      julia> t = (X = (x = 1, y = 2), Y = 3);
      + 4:5

      Private method

      source
      MLJBase.flat_valuesMethod
      flat_values(t::NamedTuple)

      View a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.

      julia> t = (X = (x = 1, y = 2), Y = 3);
       julia> flat_values(t)
      -(1, 2, 3)
      source
      MLJBase.generate_name!Method
      generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)

      Given a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.

      If the snake-case representation already exists in existing_names a suitable integer is appended to the name.

      If only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.

      julia> existing_names = [];
      +(1, 2, 3)
      source
      MLJBase.generate_name!Method
      generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)

      Given a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.

      If the snake-case representation already exists in existing_names a suitable integer is appended to the name.

      If only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.

      julia> existing_names = [];
       julia> generate_name!(Vector{Int}, existing_names)
       :vector
       
      @@ -102,7 +102,7 @@
       :not_array
       
       julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
      -:not_array2
      source
      MLJBase.guess_model_target_observation_scitypeMethod
      guess_model_targetobservation_scitype(model)

      Private method

      Try to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.guess_observation_scitypeMethod
      guess_observation_scitype(y)

      Private method.

      If y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.

      In all other cases, Unknown.

      julia> guess_observation_scitype([missing, 1, 2, 3])
      +:not_array2
      source
      MLJBase.guess_model_target_observation_scitypeMethod
      guess_model_targetobservation_scitype(model)

      Private method

      Try to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.guess_observation_scitypeMethod
      guess_observation_scitype(y)

      Private method.

      If y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.

      In all other cases, Unknown.

      julia> guess_observation_scitype([missing, 1, 2, 3])
       Union{Missing, Count}
       
       julia> guess_observation_scitype(rand(3, 2))
      @@ -112,16 +112,16 @@
       AbstractVector{Union{Continuous, Count}}
       
       julia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))
      -Unknown
      source
      MLJBase.init_rngMethod
      init_rng(rng)

      Create an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.

      source
      MLJBase.observationMethod
      observation(S)

      Private method.

      Tries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.prependMethod
      MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})

      For prepending symbols in expressions like :(y.w) and :(x1.x2.x3).

      julia> prepend(:x, :y)
      +Unknown
      source
      MLJBase.init_rngMethod
      init_rng(rng)

      Create an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.

      source
      MLJBase.observationMethod
      observation(S)

      Private method.

      Tries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.prependMethod
      MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})

      For prepending symbols in expressions like :(y.w) and :(x1.x2.x3).

      julia> prepend(:x, :y)
       :(x.y)
       
       julia> prepend(:x, :(y.z))
       :(x.y.z)
       
       julia> prepend(:w, ans)
      -:(w.x.y.z)

      If the second argument is nothing, then nothing is returned.

      source
      MLJBase.recursive_getpropertyMethod
      recursive_getproperty(object, nested_name::Expr)

      Call getproperty recursively on object to extract the value of some nested property, as in the following example:

      julia> object = (X = (x = 1, y = 2), Y = 3);
      +:(w.x.y.z)

      If the second argument is nothing, then nothing is returned.

      source
      MLJBase.recursive_getpropertyMethod
      recursive_getproperty(object, nested_name::Expr)

      Call getproperty recursively on object to extract the value of some nested property, as in the following example:

      julia> object = (X = (x = 1, y = 2), Y = 3);
       julia> recursive_getproperty(object, :(X.y))
      -2
      source
      MLJBase.recursive_setproperty!Method
      recursively_setproperty!(object, nested_name::Expr, value)

      Set a nested property of an object to value, as in the following example:

      julia> mutable struct Foo
      +2
      source
      MLJBase.recursive_setproperty!Method
      recursively_setproperty!(object, nested_name::Expr, value)

      Set a nested property of an object to value, as in the following example:

      julia> mutable struct Foo
                  X
                  Y
              end
      @@ -138,10 +138,10 @@
       42
       
       julia> object
      -Foo(Bar(1, 42), 3)
      source
      MLJBase.sequence_stringMethod
      sequence_string(itr, n=3)

      Return a "sequence" string from the first n elements generated by itr.

      julia> MLJBase.sequence_string(1:10, 4)
      -"1, 2, 3, 4, ..."

      Private method.

      source
      MLJBase.sequence_stringMethod
      sequence_string(itr, n=3)

      Return a "sequence" string from the first n elements generated by itr.

      julia> MLJBase.sequence_string(1:10, 4)
      +"1, 2, 3, 4, ..."

      Private method.

      source
      MLJBase.shuffle_rowsMethod
      shuffle_rows(X::AbstractVecOrMat,
                    Y::AbstractVecOrMat;
      -             rng::AbstractRNG=Random.GLOBAL_RNG)

      Return row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.

      source
      MLJBase.unwindMethod
      unwind(iterators...)

      Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

      Example

      julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
      +             rng::AbstractRNG=Random.GLOBAL_RNG)

      Return row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.

      source
      MLJBase.unwindMethod
      unwind(iterators...)

      Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

      Example

      julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
       julia> MLJTuning.unwind(iterators...)
       12×3 Matrix{Any}:
        1  "a"  "x"
      @@ -155,4 +155,4 @@
        1  "a"  "z"
        2  "a"  "z"
        1  "b"  "z"
      - 2  "b"  "z"
      source
      + 2 "b" "z"
      source