
The term “central tendency” determines a single value that best represents a distribution of data.
A single value is needed to distinguish among multiple values when describing data.
The mode, median and mean are the most common measures of central tendency.
The mode, median and mean are all valid measures of central tendency and under different conditions, some measures
become more appropriate to use than others.
Mode is defined as the value that occurs most frequently in the data.
Let’s consider a set of data:
1. {5,5,5,5,5,5,5,5}
2. {5,5,5,5,5,5,5,6}
3. {4,5,5,5,5,5,5,10}
A value of 5 is the most frequent and appears be the most representative for all the distributions above.
Let’s consider another set of data:
1. {5,5,5,5,5,5,5,100}
2. {4,4,4,4,5,5,5,5}
3. {1.1,1.2,1.3,1.4,1.5,1.6}
The most representative value in the first data set is 5. Problems arise when applying the mode to the remaining data sets.
In the second data set, there are two modes: the values of 4 and 5. This is a multimodal data set. It is not clear which value to use.
The third data set has continuous data with no single score whose value has a frequency greater than one.
The mode is difficult to use when dealing with continuous data because a single value is rarely repeated in the data set.
As it follows, a data set may contain multiple modes or continuous data and thus it is not appropriate to use the mode as
a measure of central tendency.
There is an alternative solution to eliminate the shortcomings of the mode.
The median separates upper and lower halves of a distribution.
Let’s consider a set of data:
1. {1,10,100}
2. {1,10,20,100}
3. {dog, dog, cat}
4. {1.1,1.2,1.3,1.4,1.5,1.6}
The median of the first distribution is 10. Therefore, the value of the median is unaffected by the actual distance between numbers.
The second distribution has the median 15. A single score doesn’t separate the distribution in half.
Instead, the median is directly in between the two middle scores.
The scores in the third distribution do not contain any inherent order or direction of difference and thus
the median is an inappropriate measure of central tendency for nominal data.
The last distribution contains continuous data but the scores are ordered according to their magnitude. The
two middle scores are 1.3 and 1.4 so the value of the median is 1.35.
As it comes, the median always results in a single value but the value of the median is unaffected by the actual distance (magnitude) between numbers.
The mode and the median reflect frequency and rank respectively but magnitude isn’t taken into account.
Unlike the median or the mode, the sum of a distribution is sensitive to magnitude but it doesn’t appear
to be representative. Therefore, actual numbers in a distribution are counted and also taken into account.
The mean is the sum of all the scores in the distribution divided by the number
of scores in the distribution and symbolized by the Greek letter mu (μ). Mean is just another name for average.
Let’s consider a set of data:
1. {1,10,100}
2. {1,1,2,100}
3. {dog, dog, cat}
4. {1.1,1.2,1.3,1.4,1.5,1.6}
The mean for the first distribution is 37.
The second distribution has the mean of 26. This value isn’t really
representative because 100 is an outlier and thus the mean is pulled in the direction of an outlier. This
distorts the representativeness of the mean.
The mean cannot be used to represent the third distribution but it can be used
to adequately represent continuous data in the fourth distribution.