Key words: The Range, Mean Deviations, Variance, Standard and Median Deviations
Let’s say you earned $1 one week, $10 the following week and $100 the third week.
As it follows from the previous article, Measures of Central Tendency,
the mean μ for this distribution is $37 per week. The mean reveals the center of the distribution but it doesn’t reveal
variability in the distribution and doesn’t provide any information concerning how the earnings are spread apart in the distribution.
In order to summarize a data set accurately and efficiently, it isn’t enough to use only measures of central tendency.
To quantify the accuracy of a single value at the distribution and its degree to which it differs from one another,
measures of variability were created.
Let’s consider a set of data:
1. {1,1,1}
2. {1,1,2}
3. {1,2,3,4,5,100}
scores in the dataset differ from one another is 99 but this value does
not appear to be representative of the typical difference among scores in the third distribution.
Therefore the Range doesn’t make use of all the scores in the distribution while it only
focuses on the difference between the maximum and minimum values and fails to take into account
a measure of central tendency.
1. {5,5,5} with μ=5
2. {6,7,1,6,5} with μ=5
The magnitude of the difference can be used to quantify how good the mean is.
1. (x – μ)={5-5, 5-5, 5-5} = {0,0,0}
2. (x – μ)={6-5, 7-5, 1-5, 6-5, 5-5} = {1,2,-4,1,0}
The maximum difference in the second distribution is 4 points, the typical difference is 1 point.
It is helpful to summarize the deviations into a single value. Taking into account that the sum of positive and negative mean deviations
will always equal zero, two possible solutions are available:
1. To sum absolute values. For the second distribution: Σ(|x – μ|)= 1+2+4+1+0 = 8
2. To square each mean deviation and then sum the resulting values. For the second distribution: Σ (x – μ)2 = 1+4+16+1+0=22
Squaring is often preferred over summing method.
Nevertheless, 22 is the sum of squared deviations and neither reflects the spread of the scores in the distribution
or the accuracy of the mean.
Variance is the squared deviation divided by the number of scores in the distribution and is symbolized by
the Greek letter sigma raised to an exponent of 2.
From the previous example it follows:
Variance = 22/5 = 4.4 which indicates that the average squared difference between a score and the mean is 4.4 units.
Variance has some limitations as well. A set of data {4,6,8} with μ=6 and variance=2.66 shows that
the variance is greater than the maximum difference of 2 in the distribution. Therefore, it is better to
transform variance so that it measures variability in original units rather than squared units.
Standard deviation is the square root of variance. The standard deviation shares all the desirable properties with variance
without resulting in an inflated value and is symbolized by sigma.
The standard deviations for the above example is squared root of 2.66 which is 1.63.
Median deviation.
Consider the dataset {5,6,7} with a median value of 6.
Deviations from the median are called median deviations, so that it results in {5-6, 6-6, 7-6} = {-1,0,1}.
A single value that describes the accuracy of the median is the median of the median deviations, so that {-1,0,1} has the median of zero.
The median of median deviations will always result in a value of zero. To overcome this, the absolute values are use.
The sorted dataset {-1,0,1} will result in {0,1,1} with the median of median deviations equal to 1.
The median of the absolute median deviations is called the median absolute deviation or MAD.