ElasticNet Compatible Estimators

Slides

The core class plqERM_ElasticNet serves as a base implementation for both classification and regression tasks. Its subclasses, plq_ElasticNet_Classifier and plq_ElasticNet_Regressor, extend the Ridge-based variants by introducing an additional l1_ratio parameter that controls the mix between L1 and L2 regularization. These estimators integrate seamlessly with scikit-learn utilities such as Pipeline, cross_val_score, and GridSearchCV.

ElasticNet regularization solves the following optimization problem:

\[\min_{\beta \in \mathbb{R}^d} \; C \sum_{i=1}^{n} \text{PLQ}(y_i, \mathbf{x}_i^T \beta) + \ell_1\text{ratio} \|\beta\|_1 + \frac{1}{2}(1 - \ell_1\text{ratio})\|\beta\|_2^2, \quad \text{s.t.} \quad \mathbf{A}\beta + \mathbf{b} \geq \mathbf{0},\]

where

  • \(\text{PLQ}(\cdot)\) is a piecewise linear-quadratic loss function (e.g., SVM hinge, quantile, Huber),

  • \(\mathbf{x}_i \in \mathbb{R}^d\) is a feature vector,

  • \(y_i\) is the response variable (class label or continuous value),

  • \(C > 0\) is the regularization strength (larger \(C\) = less regularization),

  • \(\ell_1\text{ratio} \in [0, 1]\) is the mixing parameter: \(\ell_1\text{ratio} = 1\) gives Lasso, \(\ell_1\text{ratio} = 0\) gives Ridge,

  • \(\mathbf{A}\beta + \mathbf{b} \geq \mathbf{0}\) represents optional linear constraints on \(\beta\).

Classification Example with GridSearchCV and Pipeline

Here we show a classification example using Pipeline, cross_val_score, and GridSearchCV. Compared to the Ridge classifier, the key difference is the additional l1_ratio parameter in param_grid.

[2]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
[3]:
# generate the dataset
X, y = make_classification(
    n_samples=2000,
    n_features=20,
    n_informative=8,
    n_redundant=4,
    n_repeated=0,
    n_classes=2,
    weights=[0.7, 0.3],
    class_sep=1.2,
    flip_y=0.01,
    random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)
[4]:
from rehline import plq_ElasticNet_Classifier

# set the pipeline
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", plq_ElasticNet_Classifier(loss={"name": "svm"})),
])
[5]:
# set the parameter grid
param_grid = {
    "clf__loss": [{"name": "svm"}, {"name": "sSVM"}],
    "clf__C": [0.1, 1.0, 3.0],
    "clf__l1_ratio": [0.0, 0.3, 0.5, 0.8],
    "clf__fit_intercept": [True, False],
    "clf__intercept_scaling": [0.5, 1.0, 2.0],
    "clf__max_iter": [5000, 10000],
    "clf__class_weight": [None, "balanced", {0: 1.0, 1: 2.0}],
    "clf__constraint": [
        [],
        [{"name": "nonnegative"}],
        [{"name": "fair", "sen_idx": [0], "tol_sen": 0.1}],
    ],
}
[6]:
# cross_val_score
cv_scores = cross_val_score(
    pipe,
    X_train, y_train,
    cv=5,
    scoring="accuracy",
    n_jobs=-1,
)
print("CV scores:", cv_scores)
CV scores: [0.79666667 0.82       0.82666667 0.81       0.81      ]
[7]:
# GridSearchCV
grid = GridSearchCV(
    estimator=pipe,
    param_grid=param_grid,
    scoring="accuracy",
    cv=5,
    n_jobs=-1,
    refit=True,
    verbose=1,
)

grid.fit(X_train, y_train)
Fitting 5 folds for each of 2592 candidates, totalling 12960 fits
[7]:
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('scaler', StandardScaler()),
                                       ('clf',
                                        plq_ElasticNet_Classifier(loss={'name': 'svm'}))]),
             n_jobs=-1,
             param_grid={'clf__C': [0.1, 1.0, 3.0],
                         'clf__class_weight': [None, 'balanced',
                                               {0: 1.0, 1: 2.0}],
                         'clf__constraint': [[], [{'name': 'nonnegative'}],
                                             [{'name': 'fair', 'sen_idx': [0],
                                               'tol_sen': 0.1}]],
                         'clf__fit_intercept': [True, False],
                         'clf__intercept_scaling': [0.5, 1.0, 2.0],
                         'clf__l1_ratio': [0.0, 0.3, 0.5, 0.8],
                         'clf__loss': [{'name': 'svm'}, {'name': 'sSVM'}],
                         'clf__max_iter': [5000, 10000]},
             scoring='accuracy', verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[8]:
print("Best params:", grid.best_params_)
print("Best CV accuracy:", grid.best_score_)
Best params: {'clf__C': 0.1, 'clf__class_weight': None, 'clf__constraint': [{'name': 'fair', 'sen_idx': [0], 'tol_sen': 0.1}], 'clf__fit_intercept': True, 'clf__intercept_scaling': 1.0, 'clf__l1_ratio': 0.0, 'clf__loss': {'name': 'sSVM'}, 'clf__max_iter': 5000}
Best CV accuracy: 0.8133333333333332
[9]:
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
test_acc = accuracy_score(y_test, y_pred)

print("Test accuracy:", test_acc)
print("\nClassification report:\n", classification_report(y_test, y_pred, digits=4))
print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
Test accuracy: 0.808

Classification report:
               precision    recall  f1-score   support

           0     0.8155    0.9370    0.8720       349
           1     0.7778    0.5099    0.6160       151

    accuracy                         0.8080       500
   macro avg     0.7966    0.7234    0.7440       500
weighted avg     0.8041    0.8080    0.7947       500

Confusion matrix:
 [[327  22]
 [ 74  77]]

Regression Example

Here we show a regression example using Pipeline, cross_val_score, and GridSearchCV. The l1_ratio controls the balance between lasso and ridge penalty.

[10]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
[11]:
# generate the data
X, y = make_regression(
    n_samples=1500,
    n_features=15,
    n_informative=10,
    noise=10.0,
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)
[12]:
from rehline import plq_ElasticNet_Regressor

# set the pipeline
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("reg", plq_ElasticNet_Regressor(loss={"name": "QR", "qt": 0.5})),
])
[13]:
# set the param_grid
param_grid = {
    "reg__loss": [
        {"name": "QR", "qt": 0.5},
        {"name": "huber", "tau": 1.0},
        {"name": "SVR", "epsilon": 0.1},
    ],
    "reg__C": [0.1, 1.0, 10.0],
    "reg__l1_ratio": [0.0, 0.3, 0.5, 0.8],
    "reg__fit_intercept": [True, False],
    "reg__intercept_scaling": [0.5, 1.0],
    "reg__max_iter": [5000, 8000],
    "reg__constraint": [
        [],
        [{"name": "nonnegative"}],
        [{"name": "fair", "sen_idx": [0], "tol_sen": 0.1}],
    ],
}
[14]:
# cross_val_score
cv_scores = cross_val_score(
    pipe,
    X_train, y_train,
    cv=5,
    scoring="r2",
    n_jobs=-1,
)
print("CV R^2 scores:", cv_scores)
print("Mean CV R^2:", np.mean(cv_scores))
CV R^2 scores: [0.99668483 0.99654706 0.99704323 0.99627612 0.99609029]
Mean CV R^2: 0.9965283057432174
[15]:
# GridSearchCV
grid = GridSearchCV(
    estimator=pipe,
    param_grid=param_grid,
    scoring="r2",
    cv=5,
    n_jobs=-1,
    refit=True,
    verbose=1,
)

grid.fit(X_train, y_train)
Fitting 5 folds for each of 864 candidates, totalling 4320 fits
[15]:
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('scaler', StandardScaler()),
                                       ('reg', plq_ElasticNet_Regressor())]),
             n_jobs=-1,
             param_grid={'reg__C': [0.1, 1.0, 10.0],
                         'reg__constraint': [[], [{'name': 'nonnegative'}],
                                             [{'name': 'fair', 'sen_idx': [0],
                                               'tol_sen': 0.1}]],
                         'reg__fit_intercept': [True, False],
                         'reg__intercept_scaling': [0.5, 1.0],
                         'reg__l1_ratio': [0.0, 0.3, 0.5, 0.8],
                         'reg__loss': [{'name': 'QR', 'qt': 0.5},
                                       {'name': 'huber', 'tau': 1.0},
                                       {'epsilon': 0.1, 'name': 'SVR'}],
                         'reg__max_iter': [5000, 8000]},
             scoring='r2', verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[16]:
print("Best params:", grid.best_params_)
print("Best CV R^2:", grid.best_score_)
Best params: {'reg__C': 0.1, 'reg__constraint': [{'name': 'nonnegative'}], 'reg__fit_intercept': True, 'reg__intercept_scaling': 1.0, 'reg__l1_ratio': 0.0, 'reg__loss': {'name': 'huber', 'tau': 1.0}, 'reg__max_iter': 5000}
Best CV R^2: 0.9967196763855011
[17]:
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)

print("Test R^2:", r2_score(y_test, y_pred))
print("Test MSE:", mean_squared_error(y_test, y_pred))
Test R^2: 0.9967743380626125
Test MSE: 104.74629973212267