¹  Ñëàéä  Òåêñò 
1 

Assumption of linearityAssumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption of linearity script Sample Problems SW388R7 Data Analysis & Computers II Slide 1 
2 

Assumption of linearityThe statistics that we will study this semester generally assume that the relationship between variables is linear, or they perform better if the relationships are linear. If a relationship is nonlinear, the statistics which assume it is linear will underestimate the strength of the relationship, or fail to detect the existence of a relationship. SW388R7 Data Analysis & Computers II Slide 2 
3 

LinearityLinearity means that the amount of change, or rate of change, between scores on two variables are constant for the entire range of scores for the variables. There are relationships are not linear. The relationship between learning and time may not be linear. Learning a new subject shows rapid gains at first, then the pace slows down over time. This is often referred to a a learning curve. Population growth may not be linear. The pattern often shows growth at increasing rates over time. SW388R7 Data Analysis & Computers II Slide 3 
4 

Population growth in TexasThe increase in population for the ten years from 1860 to 1870 is relatively small compared to the increase in the population for the ten years from 1960 to 1970. A difference of 214,364. A difference of 1,617,053. SW388R7 Data Analysis & Computers II Slide 4 
5 

Evaluating linearityThere are both graphical and statistical methods for evaluating linearity. Graphical methods include the examination of scatterplots, often overlaid with a trendline. While commonly recommended, this strategy is difficult to implement. Statistical methods include diagnostic hypothesis tests for linearity, a rule of thumb that says a relationship is linear if the difference between the linear correlation coefficient (r) and the nonlinear correlation coefficient (eta) is small, and examining patterns of correlation coefficients. SW388R7 Data Analysis & Computers II Slide 5 
6 

Interpreting scatterplotsThe advice for interpreting linearity is often phrased as looking for a cigarshaped band, which is very evident in this plot. SW388R7 Data Analysis & Computers II Slide 6 
7 

Interpreting scatterplotsSometimes, a scatterplot shows a clearly nonlinear pattern that requires transformation, like the one shown in the scatterplot. SW388R7 Data Analysis & Computers II Slide 7 
8 

Scatterplots that are difficult to interpretThe correlations for both of these relationships are low. The linearity of the relationship on the right can be improved with a transformation; the plot on the left cannot. However, this is not necessarily obvious from the scatterplots. SW388R7 Data Analysis & Computers II Slide 8 
9 

Using correlation matricesCreating a correlation matrix for the dependent variable and the original and transformed variations of the independent variable provides us with a pattern that is easier to interpret. The information that we need is in the first column of the matrix which shows the correlation and significance for the dependent variable and all forms of the independent variable. SW388R7 Data Analysis & Computers II Slide 9 
10 

The pattern of correlations for no relationshipThe correlation between the two variables is very weak and statistically nonsignificant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables. Moreover, none of significance tests for the correlations with the transformed dependent variable are statistically significant. There is no relationship between these variables; it is not a problem with nonlinearity. SW388R7 Data Analysis & Computers II Slide 10 
11 

Correlation pattern suggesting transformationThe correlation between the two variables is very weak and statistically nonsignificant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables. However, the probability associated with the larger correlation for the square transformation is statistically significant, suggesting that this is a transformation we might want to use in our analysis. SW388R7 Data Analysis & Computers II Slide 11 
12 

TransformationsWhen a relationship is not linear, we can transform one or both variables to achieve a relationship that is linear. Four common transformations to induce linearity are: the logarithmic transformation, the square root transformation, the inverse transformation and the square transformation. All of these transformations produce a new variable that is mathematically equivalent to the original variable, but expressed in different measurement units, e.g. logarithmic units instead of decimal units. SW388R7 Data Analysis & Computers II Slide 12 
13 

When transformations do not workWhen none of the transformations induces linearity in a relationship, our statistical analysis will underestimate the presence and strength of the relationship, i.e. we lose power. We do have the option of changing the way the information in the variables are represented, e.g. substitute several dichotomous variables for a single metric variable. This bypasses the assumption of linearity while still attempting to incorporate the information about the relationship in the analysis. SW388R7 Data Analysis & Computers II Slide 13 
14 

Strategy for solving problems  1Our strategy for determining whether or not a relationship is linear will be based on significance tests for the Pearson r correlation coefficient. If the correlation coefficient between an independent variable and a dependent variable is statistically significant (its probability is less than or equal to a specified level of significance), we will conclude that the relationship is linear. SW388R7 Data Analysis & Computers II Slide 14 
15 

Strategy for solving problems  2If linearity cannot be supported for the untransformed independent and dependent variables, we will examine the transformations for the variables. If any of the transformations for the independent or dependent variable are statistically significant when the untransformed relationship is not statistically significant, we will conclude that the problem is nonlinearity, and can be remedied by substituting the transformed variable in the analysis. If none of the transformations are statistically significant, we will conclude that there is no relationship between the variables. SW388R7 Data Analysis & Computers II Slide 15 
16 

Strategy for solving problems  3Even when relationship is linear, the analysis might still be enhanced by the inclusion of a transformed version of the independent variable to the analysis, e.g. including the square of the independent variable in a regression. If the size of their correlation coefficient for a statistically significant transformation is substantially larger than the correlation coefficient for a statistically significant correlation between the untransformed variables, we will suggest that the transformed variable be included in the analysis, as well as the original form of the variables. SW388R7 Data Analysis & Computers II Slide 16 
17 

Problem 1In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of the correlation coefficient, the relationship between "hours per day watching TV" and "total hours spent on the Internet" is not linear. However, the square transformation of the independent variable "hours per day watching TV" does result in a relationship that is linear. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic SW388R7 Data Analysis & Computers II Slide 17 
18 

Creating the scatterplotThe most commonly recommended strategy for evaluating linearity is visual examination of a scatter plot. To obtain a scatter plot in SPSS, select the Scatter… command from the Graphs menu. SW388R7 Data Analysis & Computers II Slide 18 
19 

Selecting the type of scatterplotFirst, click on thumbnail sketch of a simple scatterplot to highlight it. Second, click on the Define button to specify the variables to be included in the scatterplot. SW388R7 Data Analysis & Computers II Slide 19 
20 

Selecting the variablesFirst, move the dependent variable netime to the Y Axis text box. Third, click on the OK button to complete the specifications for the scatterplot. Second, move the independent variable tvhours to the X axis text box. If a problem statement mentions a relationship between two variables without clearly indicating which is the independent variable and which is the dependent variable, the first mentioned variable is taken to the be independent variable. SW388R7 Data Analysis & Computers II Slide 20 
21 

The scatterplotThe scatterplot is produced in the SPSS output viewer. The points in a scatterplot are considered linear if they form a cigarshaped elliptical band. The pattern in this scatterplot is not really clear. SW388R7 Data Analysis & Computers II Slide 21 
22 

Adding a trendlineTo try to determine if the relationship is linear, we can add a trendline to the chart. To add a trendline to the chart, we need to open the chart for editing. To open the chart for editing, double click on it. SW388R7 Data Analysis & Computers II Slide 22 
23 

The scatterplot in the SPSS Chart EditorThe chart that we double clicked on is opened for editing in the SPSS Chart Editor. To add the trend line, select the Options… command from the Chart menu. SW388R7 Data Analysis & Computers II Slide 23 
24 

Requesting the fit lineIn the Scatterplot Options dialog box, we click on the Total checkbox in the Fit Line panel in order to request the trend line. Click on the Fit Options… button to request the r? coefficient of determination as a measure of the strength of the relationship. SW388R7 Data Analysis & Computers II Slide 24 
25 

Requesting rFirst, the Linear regression thumbnail sketch should be highlighted as the type of fit line to be added to the chart. Third, click on the Continue button to complete the options request. Second, click on the Fit Options… Click on the Display Rsquare in Legend checkbox to add this item to our output. SW388R7 Data Analysis & Computers II Slide 25 
26 

Completing the request for the fit lineClick on the OK button to complete the request for the fit line. SW388R7 Data Analysis & Computers II Slide 26 
27 

The fit line and rThe red fit line is added to the chart. The value of r? (0.0460) suggests that the relationship is weak. SW388R7 Data Analysis & Computers II Slide 27 
28 

Changing the shape of the fit lineWe can try a trend line with a curved shape to see if it does a better job of fitting the data. To change the trend line, select the Options… command from the Chart menu. SW388R7 Data Analysis & Computers II Slide 28 
29 

Accessing the fit line optionsClick on the Fit Options… button to open up the dialog for specifying the characteristics of the fit line. SW388R7 Data Analysis & Computers II Slide 29 
30 

Specifying a quadratic fit lineFirst, click on the Quadratic regression thumbnail in the Fit Method panel. This will fit a trendline that include a square term in the equation (x?). Second, click on the Continue button to close the fit line options dialog. SW388R7 Data Analysis & Computers II Slide 30 
31 

Completing the request for the fit lineClick on the OK button to complete the request for the fit line. SW388R7 Data Analysis & Computers II Slide 31 
32 

The quadratic fit line and rThe value of r? (0.1591) falls at the top of the weak range, indicating a stronger relationship that the one represented by the linear fit line. This result hints that a squared transformation of the independent variable may be needed. The red fit line curves to reduce the discrepancies between the line and the data points. SW388R7 Data Analysis & Computers II Slide 32 
33 

Computing the transformationsThere are four transformations that we can use to achieve or improve linearity. The compute dialogs for these four transformations for linearity are shown. SW388R7 Data Analysis & Computers II Slide 33 
34 

Creating the scatterplot matrixTo create the scatterplot matrix, select the Scatter… command in the Graphs menu. SW388R7 Data Analysis & Computers II Slide 34 
35 

Selecting type of scatterplotFirst, click on the Matrix thumbnail sketch to indicate which type of scatterplot we want. Second, click on the Define button to select the variables for the scatterplot. SW388R7 Data Analysis & Computers II Slide 35 
36 

Specifications for scatterplot matrixFirst, move the dependent variable, the independent variable and all of the transformations to the Matrix Variables list box. Second, click on the OK button to produce the scatterplot. SW388R7 Data Analysis & Computers II Slide 36 
37 

The scatterplot matrixThe scatterplot matrix shows a thumbnail sketch of scatterplots for each independent variable or transformation with the dependent variable. The scatterplot matrix may suggest which transformations might be useful. SW388R7 Data Analysis & Computers II Slide 37 
38 

Creating the correlation matrixTo create the correlation matrix, select the Correlate  Bivariate… command in the Analyze menu. SW388R7 Data Analysis & Computers II Slide 38 
39 

Specifications for correlation matrixFirst, move the dependent variable, the independent variable and all of the transformations to the Variables list box. Second, click on the OK button to produce the correlation matrix. SW388R7 Data Analysis & Computers II Slide 39 
40 

The correlation matrixThe answers to the problems are based on the correlation matrix. Before we answer the question in this problem, we will use a script to produce the output. SW388R7 Data Analysis & Computers II Slide 40 
41 

The assumption of linearity scriptAn SPSS script to produce all of the output that we have produced manually is available on the course web site. After downloading the script, run it to test the assumption of linearity. Select Run Script… from the Utilities menu. SW388R7 Data Analysis & Computers II Slide 41 
42 

Selecting the assumption of linearity scriptFirst, navigate to the folder containing your scripts and highlight the LinearityAssumptionAndTransformations.SBS script. Second, click on the Run button to activate the script. SW388R7 Data Analysis & Computers II Slide 42 
43 

Specifications for linearity scriptFirst, move the dependent variable from the list of variables in the data set. Second, move the independent variable from the list of variables in the data set. The default output is transformations of the independent variable. To include transformations of the dependent variable, mark the checkboxes. Third, click on the OK button to run the script. SW388R7 Data Analysis & Computers II Slide 43 
44 

The correlation matrix and the original problemThe output from the script can be used to answer the problem question. The significance of the correlation coefficient between the untransformed variables (0.079) is not significant, suggesting either a weak or a nonlinear relationship. The correlation between the dependent variable and the square transformation (0.006) is less than the level of significance. The square transformation results in a relationship that can be treated as linear. The answer to the problem is true. SW388R7 Data Analysis & Computers II Slide 44 
45 

Problem 2In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of the correlation coefficient, there is a linear relationship between "number of hours worked in the past week" and "total hours spent on the Internet". 1. True 2. True with caution 3. False 4. Incorrect application of a statistic SW388R7 Data Analysis & Computers II Slide 45 
46 

The correlation matrixThe probability associated with the correlation coefficient between "number of hours worked in the past week" and "total hours spent on the Internet" (0.486) is greater than the level of significance. The assumption of linearity is not supported. The lack of statistical significance for all of the transformations suggests that there is no relationship between "number of hours worked in the past week" and "total hours spent on the Internet", and the lack of relationship is not attributable to nonlinearity. The answer to the problem is false. SW388R7 Data Analysis & Computers II Slide 46 
47 

Problem 3In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of the correlation coefficient, there is a linear relationship between "highest academic degree" and "occupational prestige score". 1. True 2. True with caution 3. False 4. Incorrect application of a statistic SW388R7 Data Analysis & Computers II Slide 47 
48 

The correlation matrixThe probability associated with the correlation coefficient between "highest academic degree" and "occupational prestige score" (<0.001) is less than or equal to the level of significance. The assumption of linearity is supported. Since highest academic degree is an ordinal level variable, the answer to the problem is true with caution. SW388R7 Data Analysis & Computers II Slide 48 
49 

Other problems on assumption of linearityA problem may ask about the assumption of linearity for a nominal level variable. The answer will be “An inappropriate application of a statistic” since linearity does not apply to nominal variables. A problem may ask about the assumption of linearity for an ordinal level variable. If the variable or transformed variable is linear, the correct answer to the question is “True with caution” since we may be required to defend treating an ordinal variable as metric. Questions will specify a level of significance to use and the statistical evidence upon which you should base your answer. SW388R7 Data Analysis & Computers II Slide 49 
50 

Steps in answering questions about the assumption of linearity –question 1 The following is a guide to the decision process for answering problems about linearity of the relationship: False (not linear) True (linear) True with caution (linear) SW388R7 Data Analysis & Computers II Slide 50 Correlation for untransformed variables statistically significant? Either variable ordinal level? 
51 

Steps in answering questions about the assumption of linearity –question 2 The following is a guide to the decision process for answering problems about the applicability of a transformation: False True True with caution SW388R7 Data Analysis & Computers II Slide 51 Correlation with transformed variable statistically significant? Correlation for untransformed variables statistically significant? Either variable ordinal level? 
«Assumption of linearity» 