Multicollinearity Essay

One problem that can arise in multiple regression analysis is multicollinearity. Multicollinearity is when two or more of the independent variables of a multiple regression model are highly correlated. Technically, if two of the independent variables are correlated, we have collinearity; when three or more independent variables are correlated, we have multicollinearity. However, the two terms are frequently used interchangeably. The reality of business research is that most of the time some correlation between predictors (independent variables) will be present. The problem of multicollinearity arises when the inter-correlation between predictor variables is high. This relationship causes several other problems, particularly in the interpretation of the analysis.

1.It is difficult, if not impossible, to interpret the estimates of the regression coefficients. 2.Inordinately small t values for the regression coefficients may result. 3.The standard deviations of regression coefficients are overestimated. 4.The algebraic sign of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable.

The problem of multicollinearity can arise in regression analysis in a variety of business research situations. For example, suppose a model is being developed to predict salaries in a given industry. Independent variables such as years of education, age, years in management, experience on the job, and years of tenure with the firm might be considered as predictors. It is obvious that several of these variables are correlated (virtually all of these variables have something to do with number of years, or time) and yield redundant information. Suppose a financial regression model is being developed to predict bond market rates by such independent variables as Dow Jones average, prime interest rates, GNP, producer price index, and consumer price index.

Several of these predictors are likely to be inter-correlated. The problem of multicollinearity can also affect the t values that are used to evaluate the regression coefficients. Because the problems of multicollinearity among predictors can result in an overestimation of the standard deviation of the regression coefficients, the t values tend to be under representative when multicollinearity is present. In some regression models containing multicollinearity in which all t values are non-significant, the overall F value for the model is highly significant.

Read also  Ways to Overcome the Autocorrelation Problem

Many of the problems created by multicollinearity are interpretation problems. The business researcher should be alert to and aware of multicollinearity potential with the predictors in the model and view the model outcome in light of such potential. The problem of multicollinearity is not a simple one to overcome. However, several methods offer an approach to the problem. Stepwise regression is one of the ways to prevent the problem of multicollinearity. The search process enters the variables one at a time and compares the new variable to those in solution.

If a new variable is entered and the t values on old variables become non-significant, the old variables are dropped out of solution. In this manner, it is more difficult for the problem of multicollinearity to affect the regression analysis. Of course, because of multicollinearity, some important predictors may not enter in to the analysis. Other techniques are available to attempt to control for the problem of multicollinearity. One is called a variance inflation factor, in which a regression analysis is conducted to predict an independent variable by the other independent variables.

More Essays

  • Ways to Overcome the Autocorrelation Problem

    Several approaches to data analysis can be used when autocorrelation is present. One uses additional independent variables and another transforms the independent variable. •Addition of Independent Variables Often the reason autocorrelation occurs in regression analyses is that one or more important...

  • Verifying the Assumptions Again

    From the normal probability plot and the histogram, we observe that the normality assumption is till valid. We need to verify that the assumptions for regression analysis still hold, since we have removed some variables from our analysis. The residual plots all reveal that the residuals are normally...

  • Student Satisfaction in University of the Philippines

    This study assessed the indicators and formulated an index for overall satisfaction among UPLB students. Two methods were used in formulating an index for this study, in which nine dimensions were considered. First method is through stepwise regression analysis with Categorical Principal Component Analysis...

  • Exploratory Data Analysis

    Exploratory Data Analysis Using the dataset Chamorro-Premuzic. sav, exploratory statistical analysis was carried out on the variables in the dataset. Scatter plots were formulated t give a clear visual view of the data for Extroversion and Agreeableness. Descriptive statistics were also formulated for the...

  • Describe the Error in the Conclusion

    Describe the error in the conclusion. Given: There is a linear correlation between the number of cigarettes smoked and the pulse rate. As the number of cigarettes increases the pulse rate increases. Conclusion: Cigarettes cause the pulse rate to increase The intention of a LINEAR CORRELATION ANALYSIS is to...

  • Love Relationship Among Student

    This research study examines the relationship between academic achievement and at-risk students. Many issues today affect the achievement gap and the ability for at-risk students to succeed. Most data, as revealed in the studies included in this review, conclude the factors identifying at-risk students do...

  • Pearson Correlation

    The regression coefficient was calculated to measure the correlation between the two variables namely, monthly charge and speed of connection, on the dependent variable which is the volume of DSL subscription. Regression coefficient is a constant which represents the rate of change of one variable, which in...

  • Economic Data Collection and Analysis

    From analysing the Data on the Scatter Plot the relationship between the GDP and the Population of Great Britain from 1999-2009 appears to be a moderate positive correlation relationship. Both variables are increasing at a similar rate and following a similar pattern which would indicate this relationship....

  • Graphical Analysis

    This part of the experiment looked at a pendulum to see the relationship between the length of string and the time it took to do ten full oscillations. Unlike the rubber band experiment the results produced a curve. The best-fit curve was produced from a power regression using the Graphical Analysis...

  • Advantages of the Factorial Design

    Some experiments are designed so that two or more treatments (independent variables) are explored simultaneously. Such experimental designs are referred to as factorial designs. In factorial designs, every level of each treatment is studied under the conditions of every level of all other treatments....

Read also  Verifying the Assumptions Again