PermutationTestDistanceBased#
- class frouros.callbacks.batch.PermutationTestDistanceBased(num_permutations: int, total_num_permutations: int | None = None, num_jobs: int = -1, method: str = 'auto', random_state: int | None = None, verbose: bool = False, name: str | None = None)#
Permutation test callback class that can be applied to
data_drift.batch.distance_based
detectors.- Parameters:
num_permutations (int) – number of permutations to obtain the p-value
total_num_permutations (Optional[int]) – total number of permutations to obtain the p-value, defaults to None. If None, the total number of permutations will be set to the maximum number of permutations, the minimum between all possible permutations or the global maximum number of permutations
num_jobs (int) – number of jobs, defaults to -1
method (str) – method to compute the p-value, defaults to “auto”. “auto”: if the number of permutations is greater than the maximum number of permutations, the method will be set to “approximate”. Otherwise, the method will be set to “exact”. “conservative”: p-value is computed as (number of permutations greater or equal than the observed statistic + 1) / (number of permutations + 1). “exact”: p-value is computed as the mean of the binomial cumulative distribution function as stated []. “approximate”: p-value is computed using the integral of the binomial cumulative distribution function as stated []. “estimate”: p-value is computed as the mean of the extreme statistic. p-value can be zero.
random_state (Optional[int]) – random state, defaults to None
verbose (bool) – verbose flag, defaults to False
name (Optional[str]) – name value, defaults to None. If None, the name will be set to PermutationTestDistanceBased.
- Note:
Callbacks logs are updated with the following variables:
observed_statistic: observed statistic obtained from the distance-based detector. Same distance value returned by the compare method
permutation_statistic: list of statistics obtained from the permutations
p_value: p-value obtained from the permutation test
- References:
[phipson2010permutation]Phipson, Belinda, and Gordon K. Smyth. “Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.” Statistical applications in genetics and molecular biology 9.1 (2010).
- Example:
>>> from frouros.callbacks import PermutationTestDistanceBased >>> from frouros.detectors.data_drift import MMD >>> import numpy as np >>> np.random.seed(seed=31) >>> X = np.random.multivariate_normal(mean=[1, 1], cov=[[2, 0], [0, 2]], size=100) >>> Y = np.random.multivariate_normal(mean=[0, 0], cov=[[2, 1], [1, 2]], size=100) >>> detector = MMD(callbacks=PermutationTestDistanceBased(num_permutations=1000, random_state=31)) >>> _ = detector.fit(X=X) >>> distance, callbacks_log = detector.compare(X=Y) >>> distance DistanceResult(distance=0.05643613752975596) >>> callbacks_log["PermutationTestDistanceBased"]["p_value"] 0.0009985010823343311
- property num_permutations: int#
Number of permutations property.
- Returns:
number of permutation to obtain the p-value
- Return type:
int
- property total_num_permutations: int | None#
Number of total permutations’ property.
- Returns:
number of total permutations
- Return type:
Optional[int]
- property num_jobs: int#
Number of jobs property.
- Returns:
number of jobs to use
- Return type:
int
- property method: str#
Method to compute the p-value property.
- Returns:
method to compute the p-value
- Return type:
str
- property name: str#
Name property.
- Returns:
name value
- Return type:
str
- on_compare_start(X_ref: ndarray, X_test: ndarray) None #
On compare start method.
- Parameters:
X_ref (numpy.ndarray) – reference data
X_test (numpy.ndarray) – test data
- on_fit_end(X: ndarray) None #
On fit end method.
- Parameters:
X (numpy.ndarray) – reference data
- on_fit_start(X: ndarray) None #
On fit start method.
- Parameters:
X (numpy.ndarray) – reference data
- set_detector(detector) None #
Set detector method.
- property verbose: bool#
Verbose flag property.
- Returns:
verbose flag
- Return type:
bool
- on_compare_end(result: Any, X_ref: ndarray, X_test: ndarray) None #
On compare end method.
- Parameters:
result (Any) – result obtained from the compare method
X_ref (numpy.ndarray) – reference data
X_test (numpy.ndarray) – test data
- reset() None #
Reset method.