Verifying the Assumptions Again Essay

From the normal probability plot and the histogram, we observe that the normality assumption is till valid. We need to verify that the assumptions for regression analysis still hold, since we have removed some variables from our analysis. The residual plots all reveal that the residuals are normally distributed. See Appendix VIII. However, there still exist some outliers from the residual plot of opponent 3-point per game. In a further attempt to improve on this particular model, we analyzed the data again, omitting the perceived outliers from the residual plots.

6. 4. 2 Further Improvement Performing the analysis without the observed outliers still does not make our model any better at prediction – in fact, the reverse is the case, as the R2 value moves from 77. 7% to 49. 8% an the S value moves from 0. 07979 to 0. 118905. This may be a pointer that the first model is still better at predicting the winning percentage of a team. Regression equation for this third model is: The regression equation is Winning percentage = 0. 487 + 0. 0184 Free throws per game + 0. 0240 Opponent Turn-over,pg + 0. 0188 Home rebound per game.

– 0. 0303 Oppnt rebound per game – 0. 0243 Opp 3-point per game S = 0. 118905 R-Sq = 49. 8% R-Sq(adj) = 45. 7% More details of this model are presented in Appendix IX. Since we have not yet improved on our first model, we still try to improve on it. 6. 4. 3 A Third Model We still search for a better model. We now choose another combination of variables from the Best Subset Regression Analysis. This one has 6 variables, and the regression model is presented below. (More details in Appendix X) The regression equation is Winning percentage = 0.

565 + 0. 0239 3-point per game + 0. 0163 Free throws per game – 0. 0630 Turn-over, pg + 0. 0436 Opponent Turn-over,pg + 0. 0265 Home rebound per game – 0. 0310 Oppnt rebound per game S = 0. 0755690 R-Sq = 80. 3% R-Sq(adj) = 78. 4% This model appears to be close to the first one, in which all seven variables were used. The model interpretation is as we have explained before (see p 7).

Moreover, with 80. 3% of the variability in the system being accounted for by this analysis, we also note that the standard error of our analysis is 0.0755690, which is not far from that of the seven-variable model. It is still better than the standard deviation of the explained variable (which is 0. 1625). Upon observing the residual plots for each variable, we observed an outlier in team number 7 (Notre Dame) for Team’s Turnover per Game.

The rest of the plots do not have obvious outliers. Also, the assumption of normality is not violated, since the histogram shows a normal distribution and so does the normal probability plot. 6. 5 The Final Model.

When carried the multiple regression once again without the outlier we identified, we obtained yet a better model. The regression equation is given below: (the details are in presented in Appendix XII) The regression equation is Winning percentage = 0. 604 + 0. 0226 3-point per game + 0. 0167 Free throws per game – 0. 0660 Turn-over, pg + 0. 0420 Opponent Turn-over,pg + 0. 0256 Home rebound per game – 0. 0292 Oppnt rebound per game S = 0. 0739739 R-Sq = 80. 8% R-Sq(adj) = 78. 8% The interpretation of this model is the same as we have given for the previous models (see p 7).

Read also  Multicollinearity

We only state that the model shows us that for each extra turnover per game, the percentage win should be expected to reduce, and so it s for opponent rebound per game. On the other hand, 3-point per game, free throw per game, opponent turnover per game and team’s rebound per game should all be expected to increase the winning chance of a team in this group. We take this model to be best because, even though the R2 value is 80. 8% (less than it was when we included all seven variables), the adjusted R2 value is 78. 8%, the same as that of the first model.

We believe that this takes us closer to perfection than the first model. The second consideration is the value of s. for this model we obtain s = 0. 0739739. Implication of this value is that our prediction is as close to the real thing as within (+2×0. 0739739) = +0. 1479. None of the other models took us this close. This further convinces us that this model is the best. The third consideration is that the standard deviation of the predicted variable is even less when we exclude the outlier we excluded, and the mean is even higher.

The standard deviation was 0.1625 (with mean = 0. 5946), but now it is 0. 1608 (with mean = 0. 5984). We interpret this as an improvement. Since we have excluded one observation from the data, we have a somewhat new data set. Therefore we will still examine the residual plots. The details are presented in Appendix XII (b). Here we do not have apparent outliers, and the normal probability lot and the histogram all exhibit the normality property. Thus the normality assumption is satisfied. The predictive power of the model is now indicated by the F-value = 41. 99. Before the exclusion of the outlier it was 41.

50. Our first model yielded 36. 68, and even though the 5-variable model gave us F = 43. 21, the model had other setbacks. Also, the high T-statistic values relative to the P values all confirm our conjecture. The residual plots also do not show a definite pattern that we can discern. We therefore believe that the assumption of homoskedasticity has also been satisfied. We thus settle for this model as our best for predicting the winning percentage of a basketball team in this group of basketball teams, for this particular basketball season.

More Essays

  • Ways to Overcome the Autocorrelation Problem

    Several approaches to data analysis can be used when autocorrelation is present. One uses additional independent variables and another transforms the independent variable. •Addition of Independent Variables Often the reason autocorrelation occurs in regression analyses is that one or more important...

  • Multicollinearity

    One problem that can arise in multiple regression analysis is multicollinearity. Multicollinearity is when two or more of the independent variables of a multiple regression model are highly correlated. Technically, if two of the independent variables are correlated, we have collinearity; when three or more...

  • Explaining the Order of Operations

    Order of operations is the order in which to evaluate different operations. The order of operations is critical to solving different algebraic problems. Without it people will get different answers when there is no right one because there is no correct order to interpret an expression. In order of...

  • The Aftermath of Typhoon Yolanda

    Introduction Chapter One is divided into eight parts: (1) Background of the Study, (2) Statement of the Problem, (3) Objectives, (4) Hypotheses, (5) Significance of the Study, (6) Research Framework, (7) Scope and Limitation of the Study, and (8) Definition of terms. Part One, Background of the Study,...

  • Tutorial Linguistic

    I. Linguistics analysis Linguistic Pitfalls aims at settling some problems of sentence meaning by identifying what problems there is. – Meaning-Incompleteness(闕義): lack of reference point (parameter), and the sentence meaning becomes incomplete. – Ambiguity(歧義): more than one meaning in an expression, and...

  • Exploratory Data Analysis

    Exploratory Data Analysis Using the dataset Chamorro-Premuzic. sav, exploratory statistical analysis was carried out on the variables in the dataset. Scatter plots were formulated t give a clear visual view of the data for Extroversion and Agreeableness. Descriptive statistics were also formulated for the...

  • Hamptonshire Express Case

    1. a. The simulation indicates that 584 is the optimum stocking quantity. Daily profit at this stocking quantity is $331.4346. b. Using the newsvendor model, Cu = 1 – 0.2 = 0.8 and Co = .2. Cu /(Cu + Co) = .8. Using the spreadsheet, we found Q* = NORM.INV(.8,500,100) = 584.16. The simulation and newsvendor...

  • Manage People Performance

    Analyzing the need, or performing a "needs assessment," is crucial in identifying the information that must be addressed in the program. This is where we ask the question, "What do we want our employees to get out of the program?" A great way to complete this phase is to perform a "gap analysis" by...

  • Kristens Cookie Company

    From the given data we can draw the following Process Flowchart The case analysis yields the following points 1. The time taken to fill rush order will depend on whether there are any orders that are being processed currently or not. Accordingly we will have two scenarios a). No order is being processed...

  • Analysis of Decision Making Model

    We all make decisions of varying importance in the workplace every day, so the idea that decision making can be a rather sophisticated process may at first seem strange. However, studies have shown that a large percentage of people are much poorer at decision making than they think. An understanding of what...

Read also  Ways to Overcome the Autocorrelation Problem