Supervised learning with Scikit-Learn Library
Generalized Linear Models
The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if y is the predicted value.
Across the module, we designate the vector w as coef_ and w0 as intercept_.
Ordinary Least Squares
LinearRegression fits a linear model with coefficients w to minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Mathematically it solves a problem of the form:
LinearRegression will take in its fit method arrays X, y and will store the coefficients w of the linear model in its coef_ member.
from sklearn import linear_model reg = linear_model.LinearRegression() reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) reg.coef_ # array([ 0.5, 0.5])
However, coefficient estimates for Ordinary Least Squares rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximately linear dependence, the design matrix becomes close to a singular and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed response, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.
Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of coefficients. The ridge coefficients minimize a penalized residual sum of squares
Here, a > 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of a, the greater the amount of shrinkage, and thus the coefficients become more robust to collinearity.
As with other linear models, Ridge will take in its fit method arrays X,y and will store the coefficients w of the linear model in its coef_ member:
from sklearn import linear_model reg = linear_model.Ridge(alpha = .5) reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) reg.coef_ # array([ 0.34545455, 0.34545455]) reg.intercept_ # 0.13636...
The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights.
Mathematically, it consists of a linear model trained with l1 prior as a regularizer. The objective function to minimize:
The implementation in the class Lasso uses coordinate descent as the algorithm to fit the coefficients.
from sklearn import linear_model reg = linear_model.Lasso(alpha = 0.1) reg.fit([[0, 0], [1, 1]], [0, 1]) reg.predict([[1, 1]]) # array([ 0.8])
Also useful for lower-level tasks is the function lasso_path that computes the coefficients along the full path of possible values.