Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstring return statements #789

Merged
merged 8 commits into from
May 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,107 @@ When documenting Python classes, we adhere to the convention of including docstr
rather than as a class level docstring. Docstrings should only be included at the class-level if a class does
not posess an `__init__` method, for example because it is a static class.

#### Conventions

- Names of variables, functions, classes and modules should be written between single back-ticks.
- ``` A `numpy` scalar type that ```
- ``` `X` ```
- ``` `extrapolate_constant_perc` ```

- Simple mathematical equations should be written between single back-ticks to facilitate readability in the console.
- ``` A callable that takes an `N x F` tensor, for ```
- ``` `x >= v, fun(x) >= target` ```

- Complex math should be written in LaTeX.
- ``` function where :math:`link(output - expected\_value) = sum(\phi)` ```

- Other `alibi_detect` objects should be cross-referenced using references of the form `` :role:`~object` ``, where
`role` is one of the roles listed in the [sphinx documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#cross-referencing-python-objects),
and `object` is the full path of the object to reference. For example, the `MMDDrift` detectors's `predict` method
would be referenced with `` :meth:`~alibi_detect.cd.mmd.MMDDrift.predict` ``. This will render as `MMDDrift.predict()` and
link to the relevant API docs page. The same convention can be used to reference objects from other libraries, providing the
library is included in `intersphinx_mapping` in `doc/source/conf.py`. If the `~` is removed, the absolute object location will be
rendered.

- Variable values or examples of setting an argument to a specific values should be written in double back-ticks
to facilitate readability as they are rendered in a block with orange font-color.
- ``` is set to ``True`` ```
- ``` A list of features for which to plot the ALE curves or ``'all'`` for all features. ```
- ``` The search is greedy if ``beam_size=1`` ```
- ``` if the result uses ``segment_labels=(1, 2, 3)`` and ``partial_index=1``, this will return ``[1, 2]``. ```

- Listing the possible values an argument can take.
- ``` Possible values are: ``'all'`` | ``'row'`` | ``None``. ```

- Returning the name of the variable and its description - standard convention and renders well. Writing the
variable types should be avoided as it would be duplicated from variables typing.
```
Returns
-------
raw
Array of perturbed text instances.
data
Matrix with 1s and 0s indicating whether a word in the text has not been perturbed for each sample.
```

- Returning only the description. When the name of the variable is not returned, sphinx wrongly interprets the
description as the variable name which will render the text in italic. If the text exceeds one line, ``` \ ``` need
to be included after each line to avoid introducing bullet points at the beginning of each row. Moreover, if for
example the name of a variable is included between single back-ticks, the italic font is canceled for all the words
with the exception of the ones inbetween single back-ticks.
```
Returns
-------
If the user has specified grouping, then the input object is subsampled and an object of the same \
type is returned. Otherwise, a `shap_utils.Data` object containing the result of a k-means algorithm \
is wrapped in a `shap_utils.DenseData` object and returned. The samples are weighted according to the \
frequency of the occurrence of the clusters in the original data.
```

- Returning an object which contains multiple attributes and each attribute is described individually.
In this case the attribute name is written between single back-ticks and the type, if provided, would be written in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

and the type, if provided, would be written

Could we be a little more specific here? Not detailing types in the general case (since already done by sphinx-autodoc-typehints), but giving types when multiple objects are contained in another object (i.e. a dict), makes a lot of sense IMO. However, in the latter case, are we saying this should always be done? Or its optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're hinting that it's better to give types however it's not something we've done historically. I've opened an issue to change this across the codebase and then perhaps we can change the language here to be a little stronger.

double back-ticks.
```
Returns
-------
`Explanation` object containing the anchor explaining the instance with additional metadata as attributes. \
Contains the following data-related attributes

- `anchor` : ``List[str]`` - a list of words in the proposed anchor.

- `precision` : ``float`` - the fraction of times the sampled instances where the anchor holds yields \
the same prediction as the original instance. The precision will always be threshold for a valid anchor.

- `coverage` : ``float`` - the fraction of sampled instances the anchor applies to.
```

- Documenting a dictionary follows the same principle the as above but the key should be written between
double back-ticks.
```
Default perturbation options for ``'similarity'`` sampling

- ``'sample_proba'`` : ``float`` - probability of a word to be masked.

- ``'top_n'`` : ``int`` - number of similar words to sample for perturbations.

- ``'temperature'`` : ``float`` - sample weight hyper-parameter if `use_proba=True`.

- ``'use_proba'`` : ``bool`` - whether to sample according to the words similarity.
```

- Attributes are commented inline to avoid duplication.
```
class ReplayBuffer:
"""
Circular experience replay buffer for `CounterfactualRL` (DDPG) ... in performance.
"""
X: np.ndarray #: Inputs buffer.
Y_m: np.ndarray #: Model's prediction buffer.
...
```

For more standard conventions, please check the [numpydocs style guide](https://numpydoc.readthedocs.io/en/stable/format.html).

## Building documentation
We use `sphinx` for building documentation. You can call `make build_docs` from the project root,
the docs will be built under `doc/_build/html`.
Expand Down
6 changes: 3 additions & 3 deletions alibi_detect/ad/adversarialae.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,9 +290,9 @@ def predict(self, X: np.ndarray, batch_size: int = int(1e10), return_instance_sc

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the adversarial predictions and instance level adversarial scores.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the adversarial predictions and instance level adversarial scores.
"""
adv_score = self.score(X, batch_size=batch_size)

Expand Down
6 changes: 3 additions & 3 deletions alibi_detect/ad/model_distillation.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,9 +207,9 @@ def predict(self, X: np.ndarray, batch_size: int = int(1e10), return_instance_sc

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the adversarial predictions and instance level adversarial scores.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the adversarial predictions and instance level adversarial scores.
"""
score = self.score(X, batch_size=batch_size)

Expand Down
8 changes: 6 additions & 2 deletions alibi_detect/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def concept_drift_dict():


class BaseDetector(ABC):
""" Base class for outlier, adversarial and drift detection algorithms. """
"""Base class for outlier, adversarial and drift detection algorithms."""

def __init__(self):
self.meta = copy.deepcopy(DEFAULT_META)
Expand Down Expand Up @@ -204,10 +204,12 @@ class Detector(Protocol):

Used for typing legacy save and load functionality in `alibi_detect.saving._tensorflow.saving.py`.

Note:
Note
----
This exists to distinguish between detectors with and without support for config saving and loading. Once all
detector support this then this protocol will be removed.
"""

meta: Dict

def predict(self) -> Any: ...
Expand All @@ -219,6 +221,7 @@ class ConfigurableDetector(Detector, Protocol):

Used for typing save and load functionality in `alibi_detect.saving.saving`.
"""

def get_config(self) -> dict: ...

@classmethod
Expand All @@ -233,6 +236,7 @@ class StatefulDetectorOnline(ConfigurableDetector, Protocol):

Used for typing save and load functionality in `alibi_detect.saving.saving`.
"""

t: int = 0

def save_state(self, filepath: Union[str, os.PathLike]): ...
Expand Down
63 changes: 37 additions & 26 deletions alibi_detect/cd/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,10 +135,12 @@ def __init__(
def preprocess(self, x: Union[np.ndarray, list]) -> Tuple[Union[np.ndarray, list], Union[np.ndarray, list]]:
"""
Data preprocessing before computing the drift scores.

Parameters
----------
x
Batch of instances.

Returns
-------
Preprocessed reference data and new instances.
Expand Down Expand Up @@ -174,7 +176,7 @@ def get_splits(

Returns
-------
Combined reference and test instances with labels and optionally a list with tuples of
Combined reference and test instances with labels and optionally a list with tuples of \
train and test indices for optionally different folds.
"""
# create dataset and labels
Expand Down Expand Up @@ -268,12 +270,12 @@ def predict(self, x: Union[np.ndarray, list], return_p_val: bool = True,

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the p-value, performance of the classifier
relative to its expectation under the no-change null, the out-of-fold classifier model
prediction probabilities on the reference and test data as well as the associated reference
and test instances of the out-of-fold predictions, and the trained model.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the p-value, performance of the classifier \
relative to its expectation under the no-change null, the out-of-fold classifier model \
prediction probabilities on the reference and test data as well as the associated reference \
and test instances of the out-of-fold predictions, and the trained model.
"""
# compute drift scores
p_val, dist, probs_ref, probs_test, x_ref_oof, x_test_oof = self.score(x)
Expand Down Expand Up @@ -394,10 +396,12 @@ def __init__(
def preprocess(self, x: Union[np.ndarray, list]) -> Tuple[Union[np.ndarray, list], Union[np.ndarray, list]]:
"""
Data preprocessing before computing the drift scores.

Parameters
----------
x
Batch of instances.

Returns
-------
Preprocessed reference data and new instances.
Expand All @@ -418,17 +422,18 @@ def get_splits(self, x_ref: Union[np.ndarray, list], x: Union[np.ndarray, list])
"""
Split reference and test data into two splits -- one of which to learn test locations
and parameters and one to use for tests.

Parameters
----------
x_ref
Data used as reference distribution.
x
Batch of instances.

Returns
-------
Tuple containing split train data and tuple containing split test data
Tuple containing split train data and tuple containing split test data.
"""

n_ref, n_cur = len(x_ref), len(x)
perm_ref, perm_cur = np.random.permutation(n_ref), np.random.permutation(n_cur)
idx_ref_tr, idx_ref_te = perm_ref[:int(n_ref * self.train_size)], perm_ref[int(n_ref * self.train_size):]
Expand Down Expand Up @@ -468,9 +473,9 @@ def predict(self, x: Union[np.ndarray, list], return_p_val: bool = True,

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the detector's metadata.
'data' contains the drift prediction and optionally the p-value, threshold, MMD metric and
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the detector's metadata.
- ``'data'`` contains the drift prediction and optionally the p-value, threshold, MMD metric and \
trained kernel.
"""
# compute drift scores
Expand Down Expand Up @@ -586,10 +591,12 @@ def __init__(
def preprocess(self, x: Union[np.ndarray, list]) -> Tuple[np.ndarray, np.ndarray]:
"""
Data preprocessing before computing the drift scores.

Parameters
----------
x
Batch of instances.

Returns
-------
Preprocessed reference data and new instances.
Expand Down Expand Up @@ -626,9 +633,9 @@ def predict(self, x: Union[np.ndarray, list], return_p_val: bool = True, return_

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the p-value, threshold and MMD metric.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the p-value, threshold and MMD metric.
"""
# compute drift scores
p_val, dist, distance_threshold = self.score(x)
Expand Down Expand Up @@ -748,10 +755,12 @@ def __init__(
def preprocess(self, x: Union[np.ndarray, list]) -> Tuple[np.ndarray, np.ndarray]:
"""
Data preprocessing before computing the drift scores.

Parameters
----------
x
Batch of instances.

Returns
-------
Preprocessed reference data and new instances.
Expand Down Expand Up @@ -786,9 +795,9 @@ def predict(self, x: Union[np.ndarray, list], return_p_val: bool = True, return_

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the p-value, threshold and LSDD metric.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the p-value, threshold and LSDD metric.
"""
# compute drift scores
p_val, dist, distance_threshold = self.score(x)
Expand Down Expand Up @@ -979,10 +988,10 @@ def predict(self, x: Union[np.ndarray, list], drift_type: str = 'batch',

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the feature level p-values,
threshold after multivariate correction if needed and test statistics.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the feature level p-values, threshold after \
multivariate correction if needed and test statistics.
"""
# compute drift scores
p_vals, dist = self.score(x)
Expand Down Expand Up @@ -1136,10 +1145,12 @@ def __init__(
def preprocess(self, x: Union[np.ndarray, list]) -> Tuple[np.ndarray, np.ndarray]:
"""
Data preprocessing before computing the drift scores.

Parameters
----------
x
Batch of instances.

Returns
-------
Preprocessed reference data and new instances.
Expand Down Expand Up @@ -1181,10 +1192,10 @@ def predict(self, # type: ignore[override]

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the p-value, threshold, conditional MMD test statistic
and coupling matrices.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the p-value, threshold, conditional MMD test \
statistic and coupling matrices.
"""
# compute drift scores
p_val, dist, distance_threshold, coupling = self.score(x, c)
Expand Down
12 changes: 6 additions & 6 deletions alibi_detect/cd/base_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,9 +184,9 @@ def predict(self, x_t: Union[np.ndarray, Any], return_test_stat: bool = True,

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the test-statistic and threshold.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the test-statistic and threshold.
"""
# Compute test stat and check for drift
test_stat = self.score(x_t)
Expand Down Expand Up @@ -441,9 +441,9 @@ def predict(self, x_t: Union[np.ndarray, Any], return_test_stat: bool = True,

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries.
'meta' has the model's metadata.
'data' contains the drift prediction and optionally the test-statistic and threshold.
Dictionary containing ``'meta'`` and ``'data'`` dictionaries.
- ``'meta'`` has the model's metadata.
- ``'data'`` contains the drift prediction and optionally the test-statistic and threshold.
"""
# Compute test stat and check for drift
test_stats = self.score(x_t)
Expand Down
8 changes: 3 additions & 5 deletions alibi_detect/cd/classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,9 @@ def predict(self, x: Union[np.ndarray, list], return_p_val: bool = True,

Returns
-------
Dictionary containing 'meta' and 'data' dictionaries

- 'meta' - has the model's metadata.

- 'data' - contains the drift prediction and optionally the p-value, performance of the classifier \
Dictionary containing ``'meta'`` and ``'data'`` dictionaries
- ``'meta'`` - has the model's metadata.
- ``'data'`` - contains the drift prediction and optionally the p-value, performance of the classifier \
relative to its expectation under the no-change null, the out-of-fold classifier model \
prediction probabilities on the reference and test data, and the trained model. \
"""
Expand Down
Loading