import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from frouros.datasets.real import Elec2
from frouros.detectors.concept_drift import DDM, DDMConfig
Elec2 dataset#
Example of how to use the concept drift detector DDM [1]. In order to demonstrate a simple use case, we use some features of the normalized version of Elec2 [2]. Unlike synthetic datasets, in real datasets is not possible to know for sure if and when drift occurs.
# Get Elec2 dataset and preprocess it
elec2 = Elec2()
data = elec2.load()
X = np.array(data[["nswprice", "vicprice", "transfer"]].tolist())
y = np.array(data[["class"]].tolist()).astype("str")
# First 20000 samples are used as reference to fit the model
split_idx = 20000
X_ref, y_ref, X_test, y_test = (
The following cell defines a scikit-learn pipeline that will be use as the model that feeds values to the detector.
pipeline = Pipeline([("scaler", StandardScaler()), ("model", LogisticRegression())]), y=y_ref)
Pipeline(steps=[('scaler', StandardScaler()), ('model', LogisticRegression())])
# Detector configuration class
config = DDMConfig(
detector = DDM(config=config)
A simulation of stream samples is performed using the test dataset until drift is detected. In each iteration the model performs a prediction that is compared with the ground-truth, resulting in an error value. This error value is used to update the detector. In order to check if drift is occurring, a status attribute can be acceded.
for i, (X, y) in enumerate(zip(X_test, y_test)):
y_pred = pipeline.predict(X.reshape(1, -1))
error = 1 - (y_pred.item() == y.item())
status = detector.status
if status["drift"]:
print(f"Drift detected at index {i}")
Drift detected at index 2601
