线性回归的R-square

Jun 19, 2017 | 机器学习

今天在练习线性回归的R-score时，发现有一个模型的predict之后的score居然为-1.487，这里非常疑惑，因为sklearn文档里reg.score()返回值的定义是：

Returns the coefficient of determination R^2 of the prediction.

那既然是返回预测系数是R^2，那怎么可能出现负数呢。

这里我们还是看下 http://scikit-learn.org 官方文档：

sklearn.linear_model.LinearRegression

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ^ 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ^ 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

所以实际上 $$ score = 1 - u/v $$ $$ u=((y_true - y_pred) ^2).sum() $$ $$ v=((y_true - y_true.mean()) ^ 2).sum() $$

当预测完全正确的时候u的值是0，所以score的最好得分是1；但是当预测表现不好的时候u/v这个数字可能会变的很大，所以1-u/v是有可能为负数的，而在正常情况下 0 < R-score < 1.0。

机器学习线性回归