We now repeat the same analysis where the Life Expectancy measure for the fourth data element (cell B9) is changed from 53 to 83 (Figure 4).įigure 4 – Test for outliers and influencers for revised data The formulas for the other observations are similar.įigure 3 – Formulas for observation 1 in Figure 1 AnalysisĪs you can see, no data element in Figure 1 has a significant t-test ( 1) or DFFITS (> 1). ![]() The formulas for the first observation in the table (row 6) of Figure 1 are displayed in Figure 3. Representative formulas in the worksheet in Figure 1 are displayed in Figure 2. Linear regression exampleĮxample 1: Find any potential outliers or influencers for the data in Example 1 of Regression Analysis What happens if we change the Life Expectancy measure for the fourth data element from 53 to 83?įigure 1 displays the various statistics described above for the data in Example 1.įigure 1 – Test for potential outliers and influencers Values of |DFFITS| > 1 are potential problems in small to medium samples and values of |DFFITS| > 2 are potential problems in large samples. Property 2: DFFITS can be given by the following equation: ![]() Similarly, DFFITS can be calculated without repeated regressions as shown by Property 2. Values of Cook’s distance of 1 or greater are generally viewed as high. This definition of Cook’s distance is equivalent to Furthermore, Cook’s distance combines the effects of distance and leverage to obtain one metric. Property 1 means that we don’t need to perform repeated regressions to obtain Cook’s distance. Property 1: Cook’s distance can be given by the following equation: Whereas Cook’s distance is a measure of the change in the mean vector when the ith point is removed, DFFITS is a measure of the change in the ith mean when the ith point is removed. Where ŷ j(i) is the prediction of y jby the revised regression model when the point ( x, …, x ik, y i) is removed from the sample.Īnother measure of influence is DFFITS, which is defined by the formula For the ith point in the sample, Cook’s distance is defined as A measure of this influence is called Cook’s distance. ![]() Points that have the most influence produce the largest change in the equation of the regression line. For our purposes now, we need to look at the version of the studentized residual when the ith observation is removed from the model, i.e.ĭefinition 2: If we remove a point from the sample, then the equation for the regression line changes. We will use this measure when we define Cook’s distance below. A rule of thumb (Steven’s) is that values 3 times this mean value are considered large.Īs we saw in Residuals, the standard error of the residual e i isĪnd so the studentized residuals s ihave the following property: Where there are k independent variables in the model, the mean value for leverage is ( k+1)/ n. Leverage measures how far away the data point is from the mean value. Where there is only one independent variable, we have ![]() Thus the strength of the contribution of sample value y i on the predicted value ŷ i is determined by the coefficient h ii, which is called the leverage and is usually abbreviated as h i. Where each h ijonly depends on the x values in the sample. Leverage – By Property 1 of Method of Least Squares for Multiple Regression, Y-hat = HY where H is the n × n hat matrix =. Points with large residuals are potential outliers. Is the measure of the distance of the ith sample point from the regression line. Leverageĭefinition 1: The following parameters are indicators that a sample point ( x i1, …, x ik, y i) is an outlier: for the general population, there is nothing unusual about a 6-foot man or a 125-pound man, but a 6-foot man that weighs 125 pounds is unusual. Keep in mind that since we are dealing with a multi-dimensional model, there may be data points that look perfectly fine in any single dimension but are multivariate outliers. We now look at how to detect potential outliers that have an undue influence on the multiple regression model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |