Squared SVM¶
Squared SVMs solve the following optimization problem:
\[\min_{\mathbf{\beta} \in \mathbb{R}^d} \ C \sum_{i=1}^n \left( 1 - y_i \mathbf{\beta}^\intercal \mathbf{x}_i \right)_+^2 + \frac{1}{2} \| \mathbf{\beta} \|_2^2\]
where \(\mathbf{x}_i \in \mathbb{R}^d\) is a feature vector, and \(y_i \in \{-1, 1\}\) is a binary label.
Note. Since the squared hinge loss is a plq function, we can optimize it using
rehline.plq_Ridge_Classifier. Moreover, this wrapper adapts theplqERM_Ridgeinto a classifier, compatible with the scikit-learn API.
[ ]:
## install rehline
%pip install rehline -q
[2]:
## set up plotting style
import matplotlib.pyplot as plt
import seaborn as sns
custom_palette = ["#FFE4E1", "#3D325C"]
sns.set_palette(custom_palette)
[3]:
## simulate data
import numpy as np
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
n, d = 10000, 5
X, y = make_classification(n_samples=n, n_features=d, random_state=42)
y = 2 * y - 1
X = scaler.fit_transform(X)
[4]:
## solve Squared SVM via `plq_Ridge_Classifier`
from rehline import plq_Ridge_Classifier
clf = plq_Ridge_Classifier(loss={"name": "squared SVM"}, C=1.0)
clf.fit(X=X, y=y)
[4]:
plq_Ridge_Classifier(loss={'name': 'squared SVM'})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
plq_Ridge_Classifier(loss={'name': 'squared SVM'})[5]:
import warnings
import pandas as pd
import seaborn as sns
warnings.filterwarnings("ignore", "is_categorical_dtype")
warnings.filterwarnings("ignore", "use_inf_as_na")
score = clf.decision_function(X)
df = pd.DataFrame({"score": score, "y": y})
sns.histplot(df, x="score", hue="y").set_title("Squared SVM")
plt.show()
With Pipeline¶
plq_Ridge_Classifier can be integrated into a scikit-learn Pipeline to streamline preprocessing including scaling.
[6]:
## simulate data
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
n, d = 10000, 5
X, y = make_classification(n_samples=n, n_features=d, random_state=42)
y = 2 * y - 1
[7]:
## solve Squared SVM via `plq_Ridge_Classifier`
from rehline import plq_Ridge_Classifier
pipe = Pipeline([("scaler", StandardScaler()), ("clf", plq_Ridge_Classifier(loss={"name": "squared SVM"}, C=1.0))])
pipe.fit(X, y)
[7]:
Pipeline(steps=[('scaler', StandardScaler()),
('clf', plq_Ridge_Classifier(loss={'name': 'squared SVM'}))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('scaler', StandardScaler()),
('clf', plq_Ridge_Classifier(loss={'name': 'squared SVM'}))])StandardScaler()
plq_Ridge_Classifier(loss={'name': 'squared SVM'})[8]:
## visualization
import warnings
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
warnings.filterwarnings("ignore", "is_categorical_dtype")
warnings.filterwarnings("ignore", "use_inf_as_na")
score = pipe.decision_function(X)
df = pd.DataFrame({"score": score, "y": y})
sns.histplot(df, x="score", hue="y").set_title("Squared SVM")
plt.show()
Hyperparameter Tuning with GridSearchCV¶
Due to its compatibility with the scikit-learn API, GridSearchCV can be applied to determine the optimal hyperparameters for the ReHLine model.
[9]:
from sklearn.exceptions import ConvergenceWarning
from sklearn.model_selection import GridSearchCV
warnings.filterwarnings("ignore", category=ConvergenceWarning)
# Define the parameter grid to search
param_grid = {"clf__C": [0.1, 1.0, 10.0]}
# Create the GridSearchCV object
grid_search = GridSearchCV(pipe, param_grid, cv=5)
grid_search.fit(X, y)
# Print the best parameters and score
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best CV Accuracy: {grid_search.best_score_:.4f}")
Best Parameters: {'clf__C': 0.1}
Best CV Accuracy: 0.8913
[10]:
import warnings
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
warnings.filterwarnings("ignore", "is_categorical_dtype")
warnings.filterwarnings("ignore", "use_inf_as_na")
score = grid_search.decision_function(X)
df = pd.DataFrame({"score": score, "y": y})
sns.histplot(data=df, x="score", hue="y").set_title("Squared SVM (C=0.1)")
plt.show()