{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "l4fAl3V0MysB" }, "source": [ "# Huber Regression\n", "\n", "[](https://rehline-python.readthedocs.io/en/latest/)\n", "\n", "The regularized Huber regression solves the following optimization problem:\n", "\n", "$$\n", "\\min_{\\beta \\in \\mathbb{R}^d}\n", "C \\sum_{i=1}^n H_{\\tau}(y_i - x_i^\\top \\beta)\n", "+ \\frac{\\lambda}{2}\\|\\beta\\|_2^2,\n", "$$\n", "\n", "where $H_{\\tau}(\\cdot)$ is the Huber loss with parameter $\\tau$:\n", "\n", "$$\n", "H_{\\tau}(z)=\n", "\\begin{cases}\n", "\\frac{z^2}{2}, & |z| \\le \\tau, \\\\\n", "\\tau\\left(|z|-\\frac{\\tau}{2}\\right), & |z| > \\tau.\n", "\\end{cases}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "id": "0zndTa8DOfT6" }, "source": [ "> **Note.** Since the Huber loss is a plq function, we can optimize it using `rehline.plq_Ridge_Regressor`.\n", "> Moreover, this wrapper adapts the `plqERM_Ridge` into a regressor, compatible with the scikit-learn API." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8MeXCyHX0hSB" }, "outputs": [], "source": [ "## install rehline\n", "%pip install rehline -q" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "l1yoptXqUxlc" }, "outputs": [], "source": [ "import warnings\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from sklearn.datasets import make_regression\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "96X3ttm4Em6O" }, "outputs": [], "source": [ "## simulate data\n", "np.random.seed(42)\n", "scaler_huber = StandardScaler()\n", "\n", "n, d = 10000, 5\n", "X, y = make_regression(n_samples=n, n_features=d, noise=10.0)\n", "X = scaler_huber.fit_transform(X)\n", "y = y / y.std()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "id": "vSCgN97C0ei6", "outputId": "6acc8a4d-52b5-4017-b6b7-1d7356c2318a" }, "outputs": [ { "data": { "text/html": [ "
plq_Ridge_Regressor(C=0.001, loss={'name': 'huber', 'tau': 5.0})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. plq_Ridge_Regressor(C=0.001, loss={'name': 'huber', 'tau': 5.0})Pipeline(steps=[('scaler', StandardScaler()),\n",
" ('reg',\n",
" plq_Ridge_Regressor(C=0.001,\n",
" loss={'name': 'huber', 'tau': 5.0}))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('scaler', StandardScaler()),\n",
" ('reg',\n",
" plq_Ridge_Regressor(C=0.001,\n",
" loss={'name': 'huber', 'tau': 5.0}))])StandardScaler()
plq_Ridge_Regressor(C=0.001, loss={'name': 'huber', 'tau': 5.0})