Manual ReHLine Formulation¶

ReHLine is designed to address the regularized ReLU-ReHU minimization problem, named ReHLine optimization, of the following form:

\[\min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n \sum_{l=1}^L \text{ReLU}( u_{li} \mathbf{x}_i^\intercal \mathbf{\beta} + v_{li}) + \sum_{i=1}^n \sum_{h=1}^H {\text{ReHU}}_{\tau_{hi}}( s_{hi} \mathbf{x}_i^\intercal \mathbf{\beta} + t_{hi}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \ \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0},\]

where $\mathbf{U} = (u_{li}),\mathbf{V} = (v_{li}) \in \mathbb{R}^{L \times n}$ and $\mathbf{S} = (s_{hi}),\mathbf{T} = (t_{hi}),\mathbf{\tau} = (\tau_{hi}) \in \mathbb{R}^{H \times n}$ are the ReLU-ReHU loss parameters, and $(\mathbf{A},\mathbf{b})$ are the constraint parameters.

The key to using ReHLine` to solve any problem lies in utilizing custom ReHLine parameters to represent the problem, we illustrate this with following examples. Suppose that we have X and y as our data.

## Data
## X : [n x d]
## y : [n]
import numpy as np
n, d = X.shape

Note

Most of the examples below can be directly implemented by ReHLine: Empirical Risk Minimization; we are simply illustrating how to convert the problem to the ReHLine formulation.

SVM¶

SVMs solve the following optimization problem:

\[\min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n ( 1 - y_i \mathbf{\beta}^\intercal \mathbf{x}_i )_+ + \frac{1}{2} \| \mathbf{\beta} \|_2^2\]

where $\mathbf{x}_i \in \mathbb{R}^d$ is a feature vector, and $y_i \in \{-1, 1\}$ is a binary label. Note that the SVM can be rewritten as a ReHLine optimization with

\[\mathbf{U} \leftarrow -C \mathbf{y}^\intercal/n, \quad \mathbf{V} \leftarrow C \mathbf{1}^\intercal_n/n,\]

where $\mathbf{1}_n = (1, \cdots, 1)^\intercal$ is the $n$-length one vector, $\mathbf{X} \in \mathbb{R}^{n \times d}$ is the feature matrix, and $\mathbf{y} = (y_1, \cdots, y_n)^\intercal$ is the response vector.

The python implementation is:

## SVM ReHLine parameters
clf = ReHLine()
## U
clf.U = -(C*y).reshape(1,-1)
## V
clf.V = (C*np.array(np.ones(n))).reshape(1,-1)
## Fit
clf.fit(X)

Smooth SVM¶

Smoothed SVMs solve the following optimization problem:

\[\min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n V( y_i \mathbf{\beta}^\intercal \mathbf{x}_i ) + \frac{1}{2} \| \mathbf{\beta} \|_2^2\]

where $\mathbf{x}_i \in \mathbb{R}^d$ is a feature vector, and $y_i \in \{-1, 1\}$ is a binary label, and $V(\cdot)$ is the modified Huber loss or the smoothed hinge loss:

\[\begin{split}\begin{equation*} V(z) = \begin{cases} \ 0, & z \geq 1, \\ \ (1-z)^2/2, & 0 < z \leq 1, \\ \ (1/2 - z ), & z < 0. \end{cases} \end{equation*}\end{split}\]

Smoothed SVM can be rewritten as a ReHLine optimization with

\[\mathbf{S} \leftarrow -\sqrt{C/n} \mathbf{y}^\intercal, \quad \mathbf{T} \leftarrow \sqrt{C/n} \mathbf{1}^\intercal_n, \quad \mathbf{\tau} \leftarrow \sqrt{C/n} \mathbf{1}^\intercal_n.\]

The python implementation is:

## sSVM ReHLine parameters
clf = ReHLine()
## S
clf.S = -(np.sqrt(C/n)*y).reshape(1,-1)
## T
clf.T = (np.sqrt(C/n)*np.ones(n)).reshape(1,-1)
## Tau
clf.Tau = (np.sqrt(C/n)*np.ones(n)).reshape(1,-1)
## Fit
clf.fit(X)

FairSVM¶

The SVM with fairness constraints (FairSVM) solves the following optimization problem:

\[\begin{split}\begin{align} & \min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n ( 1 - y_i \mathbf{\beta}^\intercal \mathbf{x}_i )_+ + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \nonumber \\ \text{subj. to } & \quad \frac{1}{n} \sum_{i=1}^n \mathbf{z}_i \mathbf{\beta}^\intercal \mathbf{x}_i \leq \mathbf{\rho}, \quad \frac{1}{n} \sum_{i=1}^n \mathbf{z}_i \mathbf{\beta}^\intercal \mathbf{x}_i \geq -\mathbf{\rho}, \end{align}\end{split}\]

where $\mathbf{x}_i \in \mathbb{R}^d$ is a feature vector, and $y_i \in \{-1, 1\}$ is a binary label, $mathbf{z}_i$ is a collection of centered sensitive features

\[\sum_{i=1}^n z_{ij} = 0,\]

such as gender and/or race. The constraints limit the correlation between the $d_0$-length sensitive features $\mathbf{z}_ i \in \mathbb{R}^{d_0}$ and the decision function $\mathbf{\beta}^\intercal \mathbf{x}$, and the constants $\mathbf{\rho} \in \mathbb{R}_+^{d_0}$ trade-offs predictive accuracy and fairness. Note that the FairSVM can be rewritten as a ReHLine optimization with

\[\begin{split}\mathbf{U} \leftarrow -C \mathbf{y}^\intercal/n, \quad \mathbf{V} \leftarrow C \mathbf{1}^\intercal_n/n, \quad \mathbf{A} \leftarrow \begin{pmatrix} \mathbf{Z}^\intercal \mathbf{X} / n \\ -\mathbf{Z}^\intercal \mathbf{X} / n \end{pmatrix}, \quad \mathbf{b} \leftarrow \begin{pmatrix} \mathbf{\rho} \\ \mathbf{\rho} \end{pmatrix}\end{split}\]

The python implementation is:

## FairSVM ReHLine parameters
clf = ReHLine()
## U
clf.U = -(C*y).reshape(1,-1)
## V
clf.V = (C*np.array(np.ones(n))).reshape(1,-1)
## A
## we illustrate that the first column of X as sensitive features, and tol is 0.1
X_sen = X[:,0]
tol_sen = 0.1
clf.A = np.repeat([X_sen @ X], repeats=[2], axis=0) / n
clf.A[1] = -clf.A[1]
## b
clf.b = np.array([tol_sen, tol_sen])
## Fit
clf.fit(X)

Ridge Huber regression¶

The Ridge regularized Huber minimization (RidgeHuber) solves the following optimization problem:

\[\min_{\mathbf{\beta}} \frac{1}{n} \sum_{i=1}^n H_\kappa( y_i - \mathbf{x}_i^\intercal \mathbf{\beta} ) + \frac{\lambda}{2} \| \mathbf{\beta} \|_2^2,\]

where $H_\kappa(\cdot)$ is the Huber loss with a given parameter $\kappa$:

\[\begin{split}H_\kappa(z) = \begin{cases} z^2/2, & 0 < |z| \leq \kappa, \\ \ \kappa ( |z| - \kappa/2 ), & |z| > \kappa. \end{cases}\end{split}\]

In this case, the RidgeHuber can be rewritten as a ReHLine optimization with:

\[\begin{split}\mathbf{S} \leftarrow \begin{pmatrix} -\sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \end{pmatrix}, \quad \mathbf{T} \leftarrow \begin{pmatrix} \sqrt{\frac{1}{n\lambda}} \mathbf{y}^\intercal \\ -\sqrt{\frac{1}{n\lambda}} \mathbf{y}^\intercal \\ \end{pmatrix}, \quad \mathbf{\tau} \leftarrow \begin{pmatrix} \kappa \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \\ \kappa \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \end{pmatrix}.\end{split}\]

The python implementation is:

## Huber ReHLine parameters
clf = ReHLine()
## S
clf.S = -np.repeat([np.sqrt(1/n/lam)*np.ones(n)], repeats=[2], axis=0)
clf.S[1] = -clf.S[1]
## T
clf.T = np.repeat([np.sqrt(1/n/lam)*y], repeats=[2], axis=0)
clf.T[1] = -clf.T[1]
## Tau
clf.Tau = np.repeat([kappa*np.sqrt(1/n/lam)*np.ones(n)], repeats=[2], axis=0)
## Fit
clf.fit(X)