线性回归的R-square
今天在练习线性回归的R-score时,发现有一个模型的predict之后的score居然为-1.487,这里非常疑惑,因为sklearn文档里reg.score()返回值的定义是:
Returns the coefficient of determination R^2 of the prediction.
这里我们还是看下 http://scikit-learn.org 官方文档:
sklearn.linear_model.LinearRegression
score(X, y, sample_weight=None)
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ^ 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ^ 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
所以实际上 $$ score = 1 - u/v $$ $$ u=((y_true - y_pred) ^2).sum() $$ $$ v=((y_true - y_true.mean()) ^ 2).sum() $$
当预测完全正确的时候u的值是0,所以score的最好得分是1;但是当预测表现不好的时候u/v这个数字可能会变的很大,所以1-u/v是有可能为负数的,而在正常情况下 0 < R-score < 1.0。