donderdag 15 mei 2014

2.3.2. Median

What is it?

The median is defined as: “an order statistic that gives the "middle" value of a sample. More specifically, it is the value such that an equal number of samples are less than and greater than the value (for an odd sample size), or the average of the two central values (for an even sample size).” (Weisstein, n.d.-a). So it's simply the item/value in the middle after the items have been ordered.

If there is an even number of items, the average of the two items in the middle will be the median.

The median tells us that at least 50% of the values were higher or equal to the median, and alternatively at least 50% was lower or equal.


Figure 19 shows a visualisation for the median.


Figure 19. Visualisation of the median.

Different authors use different symbols for the median. For example McClave, Benson & Sincich (2005) use simply m, Bowerman, O'Connell & Murphee (2009) use Md, Steinberg (2010) uses Mdn, and Johnson & Kuby (2011) use the symbol also used in the above definition .

Examples

Example 1
On a survey ten people were asked to rate the design of the product on a scale of Very nice, Nice, Ugly, Very Ugly. The responses the researcher got were: Very Nice, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly. The responses are already in order. Crossing them out side by side gives:

Very Nice, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly
Very Nice
, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly
Very Nice, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly
Very Nice, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly
The median is between Ugly and Nice. Note that we cannot average this.

Example 2
The following grades were obtained by various students: 2, 4, 6, 6, 6, 8, 9, 9, 9. Note that there are 9 grades. The grade in the middle will be the (9 + 1)/2-th grade, so in this case the 5th grade. The grades are already in order and the 5th grade is 6. At least 50% of the students scored a 6 or higher, and alternatively at least 50% scored a 6 or lower.

Why (not) use the median?

The median takes into consideration the order of the values. It uses therefore more information about the values than the mode does. Since it requires order it cannot be determined for measurements on nominal level.

The disadvantage for the median is that it will ignore the 'size' of the numbers on an interval or ratio level. Note that for example 1, 1, 2, 3, 3 and the data 1, 1, 2, 8, 9 both have the same median.

In heavily skewed data the median is also sometimes preferred above the arithmetic mean, since it is not quickly influenced by extreme values (or outliers).

How to find the median?

The first step is always to order the items. Then for lists of values as used in the examples the two most common methods were used as well. The first is to simply cross out from both sides until you reach the middle. The second is to find the (n + 1)/2-th value.

For frequency tables and grouped tables an 'easy' way to find the median is by determining the cumulative frequencies. Then locate the value where the cumulative frequency is for the first time higher or equal to (n + 1)/2. Alternatively you can determine the relative cumulative frequencies and then find the group that hits the 50% or more as illustrated in Table 10.

Table 10
Example of median calculation for a frequency table


Opinion
Freq.
Rel. Freq.
Cumul. Rel. Freq.
Very boring
5
10%
10%
Boring
12
24%
34%
Good
20
40%
74%
Very good
13
26%
100%


Looking in the cumulative relative frequency column the first percentage above 50 is the 74%, which belongs to the opinion ‘Good’. The median opinion is therefore Good (50% or more of the people think it is Good or even Very Good).

You could also list all values of the frequency table separately and then find the value in the middle. Using Table 10 as an example, the first five people mention ‘Very Boring’ (VB) the next twelve Boring (B), etc. as illustrated in Figure 20 below for the first 33 people

Figure 20. Frequency table converted to raw data.
The median is then the (50+1)/2 = 25.5th person, so between person 25 and 26. Since both indicated ‘Good’ the median is indeed ‘Good’.

For grouped frequency tables it is also possible to use an interpolation to get an estimate of the median. This is discussed in the side note from this chapter.

History

The French term 'valeur médiane' can be found in Cournot's (1843) Exposition de la théorie des chances et des probabilités, while the earliest known reference in English is from Galton (1881) in Report of the Anthropometric.

>>Next section: (arithmetic) mean

References
Bowerman, B. L., O’Connell, R. T., Murphree, E. S., Huchendorf, S. C., Porter, D. C., & Schur, P. J. (2009). Business statistics in practice. Boston, MA.: McGraw-Hill/Irwin.
Cournot, A. A. (1843). Exposition de la théorie des chances et des probabilités. Paris: L. Hachette.
Galton, F. (1881). Report of the Anthropometric Committee. Report of the British Association for the Advancement of Science, 51, 225–272.
Johnson, R., & Kuby, P. (2011). Elementary Statistics. Cengage Learning.
McClave, J. T., Benson, P. G., & Sincich, T. (2005). Statistics for business and economics. Upper Saddle River, NJ: Pearson/Prentice Hall.

Steinberg, W. J. (2010). Statistics Alive! SAGE.
Weisstein, E. W. (n.d.). Median. Text. Retrieved May 10, 2014, from http://mathworld.wolfram.com/Median.html

Geen opmerkingen:

Een reactie posten