Concepts#

Some concepts related to the drift detection field must be explained in order use frouros in a correct manner and at its fully potential.

What is drift detection?#

Can be defined as the process of trying to detect a significant change in the concept previously learned by a model (concept drift), or a change related to the feature/covariate distributions (data drift) that can end up producing a performance decay in model’s performance.

Traditionally there has been little consensus on the terminology and definitions of the different types of drift, as stated in [3]. In order to adopt some clear definitions, we apply those used in [2] for the concept drift part, in combination with those used in [4]’s work for detecting dataset shift using only the feature/covariate distributions.

Therefore, the problem statement can be defined as follows:

Given a time period \({[0, t]}\), a set of sample-pairs \({D=\{(X_{0}, y_{0}),...,(X_{t}, y_{t})\}}\), where \({X_{i} \in \mathbb{R}^{m}}\) is the \({m}\)-dimensional feature vector and \({y_{i} \in \mathbb{R}^{k}}\) is the \({k}\)-class vector (using one-hot encoding) if we are dealing with a classification problem or \({y_{i} \in \mathbb{R}}\) is a scalar if it is a regression problem, \({D}\) is used to fit \({\hat{f} \colon X \to Y}\) (known as model) to be as close as possible to the unknown \({{f} \colon X \to Y}\). Machine learning algorithms are typically used for this fitting procedure. \({(X_{i}, y_{i}) \notin D}\) samples obtained in \({[t+1, \infty)}\) and used by \({\hat{f}}\) may start to differ with respect to \({D}\) pairs from a statistical point of view. It is also possible that some changes occur in terms of concept of the problem (change in \({f}\)).

Since \({P(y, X) = P(y|X) P(X)}\) [3], a change in the joint distribution between two different times that can produce some performance degradation can be described as follows:

\[ P_{[0, t]}(X, y) \neq P_{[t+1, \infty)}(X, y) \]

The different types of changes that are considered as a form of drift can be categorized in the following types:

  • Concept drift: There is a change in the conditional probability \(P(y|X)\) with or without a change in \({P(X)}\). Thus, it can be defined as \({P_{[0, t]}(y|X) \neq P_{[t+1, \infty)}(y|X)}\). Concept drift methods aim to detect this type of drift. Also known as real concept drift [2].

  • Data drift: There is a change in \({P(X)}\). Therefore, this type of drift only focuses in the distribution of the covariates \({P(X)}\), so \({P_{[0, t]}(X) \neq P_{[t+1, \infty)}(X)}\). Data drift methods are designed to try to detect this type drift. Unlike concept drift taking place, the presence of data drift does not guarantee that model’s performance is being affected, but it is highly probable that is happening. We have renamed dataset shift [4] to data drift in order to maintain consistency with the concept drift definition. These data drift methods can also be used to detect label drift, also known as prior probability shift [5], where the label distribution \({P(Y)}\) is the one that changes over time, in such a way that \({P_{[0, t]}(Y) \neq P_{[t+1, \infty)}(Y)}\).

Verification latency or delay#

According to [1], is defined as the period between a model’s prediction and the availability of the ground-truth label (in case of a classification problem) or the target value (in case of a regression problem). In real-world cases, the verification latency is highly dependent on the application domain and even in some problems it is no possible to finally obtain the ground-truth/target value, which makes it impossible to detect the concept drift using concept drift methods, therefore other techniques can to be used, such as data drift methods that only focus on covariate distributions.

Drift detection methods#

Drift detection methods can be classified according to the type of drift they can detect and how they detect it.

Concept drift#

Their main objective is to detect concept drift. They are closely related to data stream mining, online and incremental learning.

At the time of writing this, Frouros only implements concept drift detectors that work in a streaming manner. This means that the detector can only be updated with a single sample each time.

Data drift#

On the other hand, there are problems where it is very costly or even impossible to obtain labels in a reasonable amount of time (see verification latency). In this case, is not possible to directly check if concept drift is occurring, so detect data drift becomes the main objective of these type of methods.

At the time of writing this, Frouros implements detectors that are capable to work in batch or streaming mode. In addition, we can difference between univariate and multivariate data drift detectors, according to the type of feature/covariate distributions used.

[1]

Denis Moreira dos Reis, Peter Flach, Stan Matwin, and Gustavo Batista. Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1545–1554. 2016.

[2] (1,2)

João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):1–37, 2014.

[3] (1,2)

Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V Chawla, and Francisco Herrera. A unifying view on dataset shift in classification. Pattern recognition, 45(1):521–530, 2012.

[4] (1,2)

Stephan Rabanser, Stephan Günnemann, and Zachary Lipton. Failing loudly: an empirical study of methods for detecting dataset shift. Advances in Neural Information Processing Systems, 2019.

[5]

Amos J Storkey. When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 30(3-28):6, 2009.