Several independent variables are always connected to the dependent variable (y) of concern. If this correlation can be calculated, it could allow us to develop more correct estimates of the dependent variable than would be achievable using linear regression.1
Multiple linear regression is used when there is more than one independent variable to predict.
Definition: Multiple linear regression
Statisticians use multiple linear regression to predict one variable’s value from several other variables. Multiple regression refers to an expansion of linear regression.
Dependent variables are the target of our prediction efforts, while independent or explanatory variables are the variables we utilize to estimate that target’s value.
A linear connection between the dependent variable and two or more independent variables is considered present for multiple regression.
Non-linearity occurs when the relationship between the dependent and independent variables is curvy rather than straight.
We are using two or more variables, both linear and non-linear regression to plot the relationship between the answer and those factors.
However, non-linear regression is notably hard to execute because it relies on predictions developed through trial and error.
Multi-linear regression: Assumptions
The assumptions of multiple linear regression are identical to those of basic linear regression.
- Independence of Observation: All data points in the multiple linear regression set were acquired using statistically sound sampling techniques, and there are no ambiguous connections between the variables.
- Normality: The data are standard.
- Homogeneity of Variance: As the independent variable is adjusted, the extent of the error in our estimate remains constant.
- Linearity: The ideal line representing the relationship between the data points is straight, not a curve or other aggregating feature.
If you are using multiple linear regression, it’s a good idea to double-check for any potential correlations between the independent variables initially.2
If the relationship between two independent variables is too strong (r2 > 0.6), then only one should be included in the regression analysis.
Calculating a Multiple Linear Regression
The equation for multiple linear regression is as follows:
|Y-intercept (constant term)|
||Slope coefficients for each explanatory variable|
|The model’s error term|
The three things that multiple linear regression computes to determine the best-fit line for every independent variable are:
- The optimal regression coefficients minimize the total error in the equation.
- The model-wide t-statistic.
- If there is no correlation between the independent and dependent variables, the corresponding p-value indicates how probable it is that the t-statistic happened by chance.
Multiple Linear Regression in R
Multiple linear regression may be done manually. However, nowadays, most researchers use statistical programs.
We will use R as our example program since it’s open-source, robust, and convenient.
Multiple Linear Regression Results
The result of a regression model can look like this:
In this example of multiple linear regression, Y is correlated with X1 () and X2 () to help explain the former.
Multiple linear regression: Reporting the results
Include the p-value, standard deviation of the estimate, and anticipated impact in your presentation.3 And do not forget to explain the significance of the regression coefficient to your audience by interpreting the data you provide.
Illustrating the results in a graph
Displaying your multiple linear regression findings on a graph might also be helpful. As the number of parameters in multiple linear regression exceeds what can be shown in a two-dimensional graphic, it is more difficult to implement than basic linear regression.
While one independent variable may be shown on the x-axis, there are other methods to present your findings, including the impact of several independent factors on the dependent variable.
Here, we have computed the projected values of the dependent variable (heart disease) throughout the whole range of actual values for the proportion of persons who bike to work. These projected values were determined by adjusting the independent variable for the impact of smoking at the lowest, medium, and highest recorded smoking rates.
A multiple linear regression model is a predictive method used to predict the linear connection between a dependent variable and one or more independent variables.
If the connection between the independent factors and the dependent variables is not linear, then multiple regression will not provide a satisfactory explanation.
The following are the presumptions upon which the multiple linear regression model rests: The correlation between the two sets of numbers is linear.
There is reasonable independence between the independent variables. The yi samples are taken at random from the whole population.
Researchers often use complex correlational research to investigate interrelationships among many study variables.
Advanced research findings may explore potential causal connections between variables using multiple regression methods.
1 Gallo, Amy. “A Refresher on Regression Analysis.” Harvard Business Review. November 4, 2015. https://hbr.org/2015/11/a-refresher-on-regression-analysis.
2 CFI Team. “Regression Analysis.” CFI. November 24, 2022. https://corporatefinanceinstitute.com/resources/data-science/regression-analysis/.
3 Laerd Statistics. “Multiple Regression Analysis using SPSS Statistics.” Accessed February 21, 2023. https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php.