Manual ReHLine Formulation ~~~~~~~~~~~~~~~~~~~~~~~~~~ `ReHLine` is designed to address the regularized ReLU-ReHU minimization problem, named *ReHLine optimization*, of the following form: .. math:: \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n \sum_{l=1}^L \text{ReLU}( u_{li} \mathbf{x}_i^\intercal \mathbf{\beta} + v_{li}) + \sum_{i=1}^n \sum_{h=1}^H {\text{ReHU}}_{\tau_{hi}}( s_{hi} \mathbf{x}_i^\intercal \mathbf{\beta} + t_{hi}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \ \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, where :math:`\mathbf{U} = (u_{li}),\mathbf{V} = (v_{li}) \in \mathbb{R}^{L \times n}` and :math:`\mathbf{S} = (s_{hi}),\mathbf{T} = (t_{hi}),\mathbf{\tau} = (\tau_{hi}) \in \mathbb{R}^{H \times n}` are the ReLU-ReHU loss parameters, and :math:`(\mathbf{A},\mathbf{b})` are the constraint parameters. The key to using `ReHLine`` to solve any problem lies in utilizing custom ReHLine parameters to represent the problem, we illustrate this with following examples. Suppose that we have `X` and `y` as our data. .. code-block:: python ## Data ## X : [n x d] ## y : [n] import numpy as np n, d = X.shape .. note:: Most of the examples below can be directly implemented by `ReHLine: Empirical Risk Minimization <./tutorials/ReHLine_ERM.rst>`_; we are simply illustrating how to convert the problem to the ReHLine formulation. SVM --- SVMs solve the following optimization problem: .. math:: \min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n ( 1 - y_i \mathbf{\beta}^\intercal \mathbf{x}_i )_+ + \frac{1}{2} \| \mathbf{\beta} \|_2^2 where :math:`\mathbf{x}_i \in \mathbb{R}^d` is a feature vector, and :math:`y_i \in \{-1, 1\}` is a binary label. Note that the SVM can be rewritten as a ReHLine optimization with .. math:: \mathbf{U} \leftarrow -C \mathbf{y}^\intercal/n, \quad \mathbf{V} \leftarrow C \mathbf{1}^\intercal_n/n, where :math:`\mathbf{1}_n = (1, \cdots, 1)^\intercal` is the $n$-length one vector, :math:`\mathbf{X} \in \mathbb{R}^{n \times d}` is the feature matrix, and :math:`\mathbf{y} = (y_1, \cdots, y_n)^\intercal` is the response vector. The python implementation is: .. code-block:: python ## SVM ReHLine parameters clf = ReHLine() ## U clf.U = -(C*y).reshape(1,-1) ## V clf.V = (C*np.array(np.ones(n))).reshape(1,-1) ## Fit clf.fit(X) Smooth SVM ---------- Smoothed SVMs solve the following optimization problem: .. math:: \min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n V( y_i \mathbf{\beta}^\intercal \mathbf{x}_i ) + \frac{1}{2} \| \mathbf{\beta} \|_2^2 where :math:`\mathbf{x}_i \in \mathbb{R}^d` is a feature vector, and :math:`y_i \in \{-1, 1\}` is a binary label, and :math:`V(\cdot)` is the modified Huber loss or the smoothed hinge loss: .. math:: \begin{equation*} V(z) = \begin{cases} \ 0, & z \geq 1, \\ \ (1-z)^2/2, & 0 < z \leq 1, \\ \ (1/2 - z ), & z < 0. \end{cases} \end{equation*} Smoothed SVM can be rewritten as a ReHLine optimization with .. math:: \mathbf{S} \leftarrow -\sqrt{C/n} \mathbf{y}^\intercal, \quad \mathbf{T} \leftarrow \sqrt{C/n} \mathbf{1}^\intercal_n, \quad \mathbf{\tau} \leftarrow \sqrt{C/n} \mathbf{1}^\intercal_n. where :math:`\mathbf{1}_n = (1, \cdots, 1)^\intercal` is the $n$-length one vector, :math:`\mathbf{X} \in \mathbb{R}^{n \times d}` is the feature matrix, and :math:`\mathbf{y} = (y_1, \cdots, y_n)^\intercal` is the response vector. The python implementation is: .. code-block:: python ## sSVM ReHLine parameters clf = ReHLine() ## S clf.S = -(np.sqrt(C/n)*y).reshape(1,-1) ## T clf.T = (np.sqrt(C/n)*np.ones(n)).reshape(1,-1) ## Tau clf.Tau = (np.sqrt(C/n)*np.ones(n)).reshape(1,-1) ## Fit clf.fit(X) FairSVM ------- The SVM with fairness constraints (FairSVM) solves the following optimization problem: .. math:: \begin{align} & \min_{\mathbf{\beta} \in \mathbb{R}^d} \frac{C}{n} \sum_{i=1}^n ( 1 - y_i \mathbf{\beta}^\intercal \mathbf{x}_i )_+ + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \nonumber \\ \text{subj. to } & \quad \frac{1}{n} \sum_{i=1}^n \mathbf{z}_i \mathbf{\beta}^\intercal \mathbf{x}_i \leq \mathbf{\rho}, \quad \frac{1}{n} \sum_{i=1}^n \mathbf{z}_i \mathbf{\beta}^\intercal \mathbf{x}_i \geq -\mathbf{\rho}, \end{align} where :math:`\mathbf{x}_i \in \mathbb{R}^d` is a feature vector, and :math:`y_i \in \{-1, 1\}` is a binary label, $\mathbf{z}_i$ is a collection of centered sensitive features .. math:: \sum_{i=1}^n z_{ij} = 0, such as gender and/or race. The constraints limit the correlation between the $d_0$-length sensitive features :math:`\mathbf{z}_ i \in \mathbb{R}^{d_0}` and the decision function :math:`\mathbf{\beta}^\intercal \mathbf{x}`, and the constants :math:`\mathbf{\rho} \in \mathbb{R}_+^{d_0}` trade-offs predictive accuracy and fairness. Note that the FairSVM can be rewritten as a ReHLine optimization with .. math:: \mathbf{U} \leftarrow -C \mathbf{y}^\intercal/n, \quad \mathbf{V} \leftarrow C \mathbf{1}^\intercal_n/n, \quad \mathbf{A} \leftarrow \begin{pmatrix} \mathbf{Z}^\intercal \mathbf{X} / n \\ -\mathbf{Z}^\intercal \mathbf{X} / n \end{pmatrix}, \quad \mathbf{b} \leftarrow \begin{pmatrix} \mathbf{\rho} \\ \mathbf{\rho} \end{pmatrix} The python implementation is: .. code-block:: python ## FairSVM ReHLine parameters clf = ReHLine() ## U clf.U = -(C*y).reshape(1,-1) ## V clf.V = (C*np.array(np.ones(n))).reshape(1,-1) ## A ## we illustrate that the first column of X as sensitive features, and tol is 0.1 X_sen = X[:,0] tol_sen = 0.1 clf.A = np.repeat([X_sen @ X], repeats=[2], axis=0) / n clf.A[1] = -clf.A[1] ## b clf.b = np.array([tol_sen, tol_sen]) ## Fit clf.fit(X) Ridge Huber regression ---------------------- The Ridge regularized Huber minimization (RidgeHuber) solves the following optimization problem: .. math:: \min_{\mathbf{\beta}} \frac{1}{n} \sum_{i=1}^n H_\kappa( y_i - \mathbf{x}_i^\intercal \mathbf{\beta} ) + \frac{\lambda}{2} \| \mathbf{\beta} \|_2^2, where :math:`H_\kappa(\cdot)` is the Huber loss with a given parameter :math:`\kappa`: .. math:: H_\kappa(z) = \begin{cases} z^2/2, & 0 < |z| \leq \kappa, \\ \ \kappa ( |z| - \kappa/2 ), & |z| > \kappa. \end{cases} In this case, the RidgeHuber can be rewritten as a ReHLine optimization with: .. math:: \mathbf{S} \leftarrow \begin{pmatrix} -\sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \end{pmatrix}, \quad \mathbf{T} \leftarrow \begin{pmatrix} \sqrt{\frac{1}{n\lambda}} \mathbf{y}^\intercal \\ -\sqrt{\frac{1}{n\lambda}} \mathbf{y}^\intercal \\ \end{pmatrix}, \quad \mathbf{\tau} \leftarrow \begin{pmatrix} \kappa \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \\ \kappa \sqrt{\frac{1}{n\lambda}} \mathbf{1}^\intercal_n \\ \end{pmatrix}. The python implementation is: .. code-block:: python ## Huber ReHLine parameters clf = ReHLine() ## S clf.S = -np.repeat([np.sqrt(1/n/lam)*np.ones(n)], repeats=[2], axis=0) clf.S[1] = -clf.S[1] ## T clf.T = np.repeat([np.sqrt(1/n/lam)*y], repeats=[2], axis=0) clf.T[1] = -clf.T[1] ## Tau clf.Tau = np.repeat([kappa*np.sqrt(1/n/lam)*np.ones(n)], repeats=[2], axis=0) ## Fit clf.fit(X)