Elec2 dataset

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from frouros.datasets.real import Elec2
from frouros.detectors.concept_drift import DDM, DDMConfig

Elec2 dataset#

Example of how to use the concept drift detector DDM [1]. In order to demonstrate a simple use case, we use some features of the normalized version of Elec2 [2]. Unlike synthetic datasets, in real datasets is not possible to know for sure if and when drift occurs.

# Get Elec2 dataset and preprocess it
elec2 = Elec2()
elec2.download()
data = elec2.load()
X = np.array(data[["nswprice", "vicprice", "transfer"]].tolist())
y = np.array(data[["class"]].tolist()).astype("str")
# First 20000 samples are used as reference to fit the model
split_idx = 20000
X_ref, y_ref, X_test, y_test = (
    X[:split_idx],
    y[:split_idx].ravel(),
    X[split_idx:],
    y[split_idx:],
)
INFO:frouros:Trying to download data from https://nextcloud.ifca.es/index.php/s/2coqgBEpa82boLS/download to /tmp/tmpgbcplvay

The following cell defines a scikit-learn pipeline that will be use as the model that feeds values to the detector.

pipeline = Pipeline([("scaler", StandardScaler()), ("model", LogisticRegression())])
pipeline.fit(X=X_ref, y=y_ref)
Pipeline(steps=[('scaler', StandardScaler()), ('model', LogisticRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
# Detector configuration class
config = DDMConfig(
    warning_level=2.0,
    drift_level=3.0,
    min_num_instances=2000,
)
detector = DDM(config=config)

A simulation of stream samples is performed using the test dataset until drift is detected. In each iteration the model performs a prediction that is compared with the ground-truth, resulting in an error value. This error value is used to update the detector. In order to check if drift is occurring, a status attribute can be acceded.

for i, (X, y) in enumerate(zip(X_test, y_test)):
    y_pred = pipeline.predict(X.reshape(1, -1))
    error = 1 - (y_pred.item() == y.item())
    detector.update(value=error)
    status = detector.status
    if status["drift"]:
        print(f"Drift detected at index {i}")
        break
Drift detected at index 2601
[1]

Joao Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. Learning with drift detection. In Brazilian symposium on artificial intelligence, 286–295. Springer, 2004.

[2]

Michael Harries. Splice-2 comparative evaluation: electricity pricing. Technical Report, The University of South Wales, 1999.