rehline

Package Contents

Classes

ReHLoss

A ReHLine loss function composed of one or multiple ReLU and ReHU components.

ReHLine

(main class) ReHLine Minimization. (draft version v1.0)

Functions

relu(x)

Evaluation of ReLU given a vector.

rehu(x[, cut])

Evaluation of ReHU given a vector.

make_fair_classification([n_samples, n_features, ...])

Generate a random binary fair classification problem.

class rehline.ReHLoss(relu_coef, relu_intercept, rehu_coef=np.empty(shape=(0, 0)), rehu_intercept=np.empty(shape=(0, 0)), rehu_cut=1)

Bases: object

A ReHLine loss function composed of one or multiple ReLU and ReHU components.

Parameters:
relu_coef{array-like} of shape (n_relu, n_samples)

ReLU coeff matrix, where n_loss is the number of losses and n_relu is the number of relus.

relu_intercept{array-like} of shape (n_relu, n_samples)

ReLU intercept matrix, where n_loss is the number of losses and n_relu is the number of relus.

rehu_coef{array-like} of shape (n_rehu, n_samples)

ReHU coeff matrix, where n_loss is the number of losses and n_relu is the number of rehus.

rehu_intercept{array-like} of shape (n_rehu, n_samples)

ReHU coeff matrix, where n_loss is the number of losses and n_relu is the number of rehus.

rehu_cut{array-like} of shape (n_rehu, n_samples)

ReHU cutpoints matrix, where n_loss is the number of losses and n_relu is the number of rehus.

__call__(x)

Evaluate ReHLoss given a data matrix

x: {array-like} of shape (n_samples, )

Training vector, where n_samples is the number of samples

class rehline.ReHLine(loss={'name': 'QR', 'qt': [0.25, 0.75]}, C=1.0, U=np.empty(shape=(0, 0)), V=np.empty(shape=(0, 0)), Tau=np.empty(shape=(0, 0)), S=np.empty(shape=(0, 0)), T=np.empty(shape=(0, 0)), A=np.empty(shape=(0, 0)), b=np.empty(shape=0), max_iter=1000, tol=0.0001, shrink=1, verbose=0, trace_freq=100)

Bases: sklearn.base.BaseEstimator

(main class) ReHLine Minimization. (draft version v1.0)

\[\begin{split}\min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n \sum_{l=1}^L \text{ReLU}( u_{li} \mathbf{x}_i^\intercal \mathbf{\beta} + v_{li}) + \sum_{i=1}^n \sum_{h=1}^H {\text{ReHU}}_{\tau_{hi}}( s_{hi} \mathbf{x}_i^\intercal \mathbf{\beta} + t_{hi}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \\ \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0},\end{split}\]

where \(\mathbf{U} = (u_{li}),\mathbf{V} = (v_{li}) \in \mathbb{R}^{L \times n}\) and \(\mathbf{S} = (s_{hi}),\mathbf{T} = (t_{hi}),\mathbf{\tau} = (\tau_{hi}) \in \mathbb{R}^{H \times n}\) are the ReLU-ReHU loss parameters, and \((\mathbf{A},\mathbf{b})\) are the constraint parameters.

Parameters:
Cfloat, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. C will be absorbed by the ReHLine parameters when self.make_ReLHLoss is conducted.

verboseint, default=0

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=1000

The maximum number of iterations to be run.

U, V: array of shape (L, n_samples), default=np.empty(shape=(0, 0))

The parameters pertaining to the ReLU part in the loss function.

Tau, S, T: array of shape (H, n_samples), default=np.empty(shape=(0, 0))

The parameters pertaining to the ReHU part in the loss function.

A: array of shape (K, n_features), default=np.empty(shape=(0, 0))

The coefficient matrix in the linear constraint.

b: array of shape (K, ), default=np.empty(shape=0)

The intercept vector in the linear constraint.

References

Examples

>>> ## test SVM on simulated dataset
>>> import numpy as np
>>> from rehline import ReHLine
>>> # simulate classification dataset
>>> n, d, C = 1000, 3, 0.5
>>> np.random.seed(1024)
>>> X = np.random.randn(1000, 3)
>>> beta0 = np.random.randn(3)
>>> y = np.sign(X.dot(beta0) + np.random.randn(n))
>>> # Usage 1: build-in loss
>>> clf = ReHLine(loss={'name': 'svm'}, C=C)
>>> clf.make_ReLHLoss(X=X, y=y, loss={'name': 'svm'})
>>> clf.fit(X=X)
>>> print('sol privided by rehline: %s' %clf.coef_)
>>> sol privided by rehline: [ 0.74104604 -0.00622664  2.66991198]
>>> print(clf.decision_function([[.1,.2,.3]]))
>>> [0.87383287]
>>> # Usage 2: manually specify params
>>> n, d = X.shape
>>> U = -(C*y).reshape(1,-1)
>>> L = U.shape[0]
>>> V = (C*np.array(np.ones(n))).reshape(1,-1)
>>> clf = ReHLine(loss={'name': 'svm'}, C=C)
>>> clf.U, clf.V = U, V
>>> clf.fit(X=X)
>>> print('sol privided by rehline: %s' %clf.coef_)
>>> sol privided by rehline: [ 0.7410154  -0.00615574  2.66990408]
>>> print(clf.decision_function([[.1,.2,.3]]))
>>> [0.87384162]
Attributes:
coef_array of shape (n_features,)

Weights assigned to the features (coefficients in the primal problem).

n_iter_: int

Maximum number of iterations run across all classes.

make_ReLHLoss(X, y, loss={})

The make_ReLHLoss function generates parameters for the ReLoss, based on the provided training data.

The function matches the specific ReLoss (self.loss) with loss functions like ‘hinge’, ‘svm’, ‘SVM’, ‘check’, ‘quantile’, ‘quantile regression’, ‘QR’, ‘sSVM’, ‘smooth SVM’, ‘smooth hinge’, ‘TV’, ‘huber’, and ‘custom’.

Parameters:
Xndarray of shape (n_samples, n_features)

The generated samples.

yndarray of shape (n_samples,)

The +/- labels for class membership of each sample.

loss: dictionary

A dictionary that provides the loss function type and properties (optional).

append_l1(X, l1_pen=1.0)

This function appends the l1 penalty to the ReHLine problem. The formulation becomes:

\[\begin{split}\min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n \sum_{l=1}^L \text{ReLU}( u_{li} \mathbf{x}_i^\intercal \mathbf{\beta} + v_{li}) + \sum_{i=1}^n \sum_{h=1}^H {\text{ReHU}}_{\tau_{hi}}( s_{hi} \mathbf{x}_i^\intercal \mathbf{\beta} + t_{hi}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2 + \lambda_1 \| \mathbf{\beta} \|_1, \\ \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0},\end{split}\]

where \(\lambda_1\) is associated with l1_pen.

Parameters:
Xndarray of shape (n_samples, n_features)

The generated samples.

l1_penfloat, default=1.0

The l1 penalty level, which controls the complexity or sparsity of the resulting model.

Returns:
X_fake: ndarray of shape (n_samples+n_features, n_features)

The manipulated data matrix. It has been padded with identity matrix, allowing the correctly structured data to be input into self.fit or other modelling processes.

Examples

>>> import numpy as np
>>> from rehline import ReHLine
>>> # simulate classification dataset
>>> n, d, C, lam1 = 1000, 3, 0.5, 1.0
>>> np.random.seed(1024)
>>> X = np.random.randn(1000, 3)
>>> beta0 = np.random.randn(3)
>>> y = np.sign(X.dot(beta0) + np.random.randn(n))
>>> clf = ReHLine(loss={'name': 'svm'}, C=C)
>>> clf.make_ReLHLoss(X=X, y=y, loss={'name': 'svm'})
>>> # save and fit with the manipulated data matrix
>>> X_fake = clf.append_l1(X, l1_pen=lam1)
>>> clf.fit(X=X_fake)
>>> print('sol privided by rehline: %s' %clf.coef_)
>>> sol privided by rehline: [ 7.17796629e-01 -1.87075728e-06  2.61965622e+00] #sparse sol
>>> print(clf.decision_function([[.1,.2,.3]]))
>>> [0.85767616]
auto_shape()

Automatically generate the shape of the parameters of the ReHLine loss function.

call_ReLHLoss(score)

Return the value of the ReHLine loss of the score.

Parameters:
scorendarray of shape (n_samples, )

The input score that will be evaluated through the ReHLine loss.

Returns:
float

ReHLine loss evaluation of the given score.

fit(X, sample_weight=None)

Fit the model based on the given training data.

Parameters:
X: {array-like} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

sample_weightarray-like of shape (n_samples,), default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:
selfobject

An instance of the estimator.

decision_function(X)

The decision function evaluated on the given dataset

Parameters:
Xarray-like of shape (n_samples, n_features)

The data matrix.

Returns:
ndarray of shape (n_samples, )

Returns the decision function of the samples.

rehline.relu(x)

Evaluation of ReLU given a vector.

Parameters:
x: {array-like} of shape (n_samples, )
Training vector, where `n_samples` is the number of samples
Returns:
array of shape (n_samples, )

An array with ReLU applied, i.e., all negative values are replaced with 0.

rehline.rehu(x, cut=1)

Evaluation of ReHU given a vector.

Parameters:
x: {array-like} of shape (n_samples, )

Training vector, where n_samples is the number of samples

cut: {array-like} of shape (n_samples, )

Cutpoints of ReHU, where n_samples is the number of samples

Returns:
array of shape (n_samples, )

The result of the ReHU function.

rehline.make_fair_classification(n_samples=100, n_features=5, ind_sensitive=0)

Generate a random binary fair classification problem.

Parameters:
n_samplesint, default=100

The number of samples.

n_featuresint, default=5

The total number of features.

ind_sensitiveint, default=0

The index of the sensitive feature.

Returns:
Xndarray of shape (n_samples, n_features)

The generated samples.

yndarray of shape (n_samples,)

The +/- labels for class membership of each sample.

X_sen: ndarray of shape (n_samples,)

The centered samples of the sensitive feature.