Metrics for Regression Model

Prachi Agrawal
Analytics Vidhya
Published in
4 min readMar 16, 2021

--

Evaluating the performance of the ML model is an very essential part of any project. There are many different evaluating metrics for regression and classification models. In this blog, we will learn about metrics for the regression model.

The regression model has many popular metrics to evaluate model performance. Performance metrics for any model should be chosen based on its objective, domain and goals. Below are some metrics for evaluating the performance of the regression model:

  • Mean Absolute Error (MAE): Error is basically the absolute difference between actual values and predicted values of the model. So the mean absolute error or MAE is the error that takes the average of this error from every sample in the dataset.

This can be implemented using the sklearn library:

from sklearn.metrics import mean_absolute_error
mean_absolute_error(actual_value, predicted_value)

The MAE measures the average magnitude of the errors in a set of forecasts, without considering their direction. MAE might not be a good metric for evaluation as it doesn’t give us an idea about the underfitting and overfitting of the model. It is not very sensitive to outliers in comparison to MSE since it doesn’t punish huge errors. It is usually used when the performance is measured on continuous variable data. It gives a linear value, which averages the weighted individual differences equally. The lower the value, the better is the model’s performance.

  • Mean Squared Error (MSE): MSE is the error calculated by taking the average of the square of the difference between actual and predicted values.

This can be implemented using the sklearn library:

from sklearn.metrics import mean_squared_error
mean_squared_error(actual_values, predicted_values)

The MSE tells you how close a regression line is to a set of points. The smaller the means squared error, the closer you are to finding the line of best fit. Depending on your data, it may be impossible to get a very small value for the mean squared error. It is one of the most commonly used metrics, but least useful when a single bad prediction would ruin the entire model’s predicting abilities, i.e when the dataset contains a lot of noise. It is most useful when the dataset contains outliers, or unexpected values (too high or too low values).

  • Root Mean Squared Error (RMSE): RMSE is the standard deviation of the errors which occur when a prediction is made on a dataset. This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model.

This can be implemented using the sklearn library:

from sklearn.metrics import mean_squared_error
from math import sqrt
mean_squared_error(actual_values, predicted_values)
# taking root of mean squared error
root_mean_squared_error = sqrt(mean_squared_error)

When standardized observations and forecasts are used as RMSE inputs, there is a direct relationship with the correlation coefficient. For example, if the correlation coefficient is 1, the RMSE will be 0, because all of the points lie on the regression line (and therefore there are no errors). RMSE is a measure of how spread out these residuals(predicted errors) are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in forecasting, and regression analysis to verify experimental results.

  • R squared: R squared is also known as the coefficient of determination. The R squared value lies between 0 and 1 where 0 indicates that this model doesn’t fit the given data and 1 indicates that the model fits perfectly to the dataset provided.
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(LinearRegression(), X, y,scoring='r2')

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively.

  • Adjusted R- squared: The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it’s usually not. It is always lower than the R-squared.

Conclusion: We have seen different types of metrics used for evaluating regression model along with their use cases in this article.

Hope you find this article helpful. Keep learning!

--

--