In nearly every collection of data, there is variability. We are interested in identifying the sources of that variability. When we are focusing our attention on a single source, that source is called systematic variance -- the source of variability that is under investigation. All other sources of variability are lumped into one indefinite mass called error variance. Error variance has little to do with "error" although variability due to errors can be part of the error variance; it really refers to whatever sources of variability on which we are not focusing our attention. Because systematic variance is due to those variables under investigation and error variance encompasses every other source of variability, the two together equal the total observed variance:
total variance = systematic variance + error variance
We are usually interested in quantifying what proportion or part of the total variability is due to the factor(s) under investigation. We can specify this proportion of variance accounted for as a percentage by simple division:
systematic variance x 100
The tough part is calculating the systematic and total variance.
Total variance is easy - throw all of your numbers together (forgetting about the groups/conditions that originally divided the data) and calculate the variance of the data.
Systematic variance is difficult. However, finding the error variance is much easier, so we'll just calculate the error variance and obtain systematic variance by subtraction (since total = systematic + error, then systematic = total - error).
Recall that error variance measures all of the sources of variability that are not due to the factor under investigation. Let's say we're interested in determining whether attending class helps your GPA. We establish two groups - those who attend lectures and those who don't. We then measure everyone's GPA and observe considerable variability. If we want to estimate the variability that is not due to differences between the two groups (i.e., error variance), our task is simple -- we just measure the variability within each group (in this example, we get two variance values, one for lecture attenders and one for lecture non-attenders). Any variability among the attenders can't be due to whether they attend class or not because all of these people do so; thus, any variability within this group observed must be due to other factors - error variance. The same holds true for any remaining variability in GPA among the nonattenders. So, we are part way there - we have two "estimates" of error variance (one for attenders and one for nonattenders) but we only need one.
To derive one estimate from the two, we "pool" the estimates by using a weighted average. We weight the average so that if one of the groups is larger than the other then its error variance estimate is given more weight. The formula for pooling these estimates was presented in class. Note that if the two groups are of identical size, then pooling is easy - just average the two error variance estimates.
The following excerpt from an Excel spreadsheet shows two columns of data and the formulas for calculating the various variance values and the proportion of variance accounted for (comparing GPAs across two class sections). Note that the systematic variance (top of the proportion formula) was found via subtraction: total - error.
The spreadsheet uses two built-in Excel functions: AVERAGE calculates an average and VAR calculates the variance of a list of numbers. The "*" symbol means multiply and the "/" means divide.
The following example simply averages the two component variances because the two group sizes are identical (both groups have 10 values) rather than the longer pooling formula presented in class for unequal groups.
The following shows the same data but with the formulas evaluated so that you can see the results.