PermutationTestDistanceBased#

class frouros.callbacks.batch.PermutationTestDistanceBased(num_permutations: int, total_num_permutations: int | None = None, num_jobs: int = -1, method: str = 'auto', random_state: int | None = None, verbose: bool = False, name: str | None = None)#

Permutation test callback class that can be applied to data_drift.batch.distance_based detectors.

Parameters:
  • num_permutations (int) – number of permutations to obtain the p-value

  • total_num_permutations (Optional[int]) – total number of permutations to obtain the p-value, defaults to None. If None, the total number of permutations will be set to the maximum number of permutations, the minimum between all possible permutations or the global maximum number of permutations

  • num_jobs (int) – number of jobs, defaults to -1

  • method (str) – method to compute the p-value, defaults to “auto”. “auto”: if the number of permutations is greater than the maximum number of permutations, the method will be set to “approximate”. Otherwise, the method will be set to “exact”. “conservative”: p-value is computed as (number of permutations greater or equal than the observed statistic + 1) / (number of permutations + 1). “exact”: p-value is computed as the mean of the binomial cumulative distribution function as stated []. “approximate”: p-value is computed using the integral of the binomial cumulative distribution function as stated []. “estimate”: p-value is computed as the mean of the extreme statistic. p-value can be zero.

  • random_state (Optional[int]) – random state, defaults to None

  • verbose (bool) – verbose flag, defaults to False

  • name (Optional[str]) – name value, defaults to None. If None, the name will be set to PermutationTestDistanceBased.

Note:

Callbacks logs are updated with the following variables:

  • observed_statistic: observed statistic obtained from the distance-based detector. Same distance value returned by the compare method

  • permutation_statistic: list of statistics obtained from the permutations

  • p_value: p-value obtained from the permutation test

References:

[phipson2010permutation]

Phipson, Belinda, and Gordon K. Smyth. “Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.” Statistical applications in genetics and molecular biology 9.1 (2010).

Example:

>>> from frouros.callbacks import PermutationTestDistanceBased
>>> from frouros.detectors.data_drift import MMD
>>> import numpy as np
>>> np.random.seed(seed=31)
>>> X = np.random.multivariate_normal(mean=[1, 1], cov=[[2, 0], [0, 2]], size=100)
>>> Y = np.random.multivariate_normal(mean=[0, 0], cov=[[2, 1], [1, 2]], size=100)
>>> detector = MMD(callbacks=PermutationTestDistanceBased(num_permutations=1000, random_state=31))
>>> _ = detector.fit(X=X)
>>> distance, callbacks_log = detector.compare(X=Y)
>>> distance
DistanceResult(distance=0.05643613752975596)
>>> callbacks_log["PermutationTestDistanceBased"]["p_value"]
0.0009985010823343311
property num_permutations: int#

Number of permutations property.

Returns:

number of permutation to obtain the p-value

Return type:

int

property total_num_permutations: int | None#

Number of total permutations’ property.

Returns:

number of total permutations

Return type:

Optional[int]

property num_jobs: int#

Number of jobs property.

Returns:

number of jobs to use

Return type:

int

property method: str#

Method to compute the p-value property.

Returns:

method to compute the p-value

Return type:

str

property name: str#

Name property.

Returns:

name value

Return type:

str

on_compare_start(X_ref: ndarray, X_test: ndarray) None#

On compare start method.

Parameters:
  • X_ref (numpy.ndarray) – reference data

  • X_test (numpy.ndarray) – test data

on_fit_end(X: ndarray) None#

On fit end method.

Parameters:

X (numpy.ndarray) – reference data

on_fit_start(X: ndarray) None#

On fit start method.

Parameters:

X (numpy.ndarray) – reference data

set_detector(detector) None#

Set detector method.

property verbose: bool#

Verbose flag property.

Returns:

verbose flag

Return type:

bool

on_compare_end(result: Any, X_ref: ndarray, X_test: ndarray) None#

On compare end method.

Parameters:
  • result (Any) – result obtained from the compare method

  • X_ref (numpy.ndarray) – reference data

  • X_test (numpy.ndarray) – test data

reset() None#

Reset method.