Table 16
Example of variance calculation
Item
|
Value
|
Value – Mean = Difference
|
Squared deviation
|
1
|
4
|
4 – 8 = -4
|
(-4)2 = 16
|
2
|
8
|
8 – 8 = 0
|
02 = 0
|
3
|
12
|
12 – 8 = 4
|
42 = 16
|
4
|
10
|
10 – 8 = 2
|
22 = 4
|
5
|
6
|
6 – 8 = -2
|
(-2)2= 4
|
The total squared deviation is equal to 16 + 0 + 16 + 4 + 4 = 40, so the mean squared deviation is 40 / 5 = 8. On average a value had a squared difference of 8 with the mean of 8. This is known as the variance.
Note that it is (-4)2 and not -42. If you type in -42 on a good calculator it will properly give -16 as a result (first square then add the negative). So be careful if you do this calculation by hand.
Because we squared values the interpretation of the variance is a bit difficult. We should compare the found variance of 8 with the squared values, instead of the regular values. To compensate for the squaring we could do the opposite of squaring, which is taking the square root out of it: √8 ≈ 2.83. So on average each value was roughly 2.83 above or below the mean. This is known as the standard deviation.
Note that the standard deviation (2.83) is not the same as the mean absolute deviation. This is because the square of a sum of values, is not the same as the sum of the squares of each value separately (e.g. 32 + 42 ≠ (3 + 4)2).
Three variances
Although most textbooks stick to one or two, there are actually three different variances: the population variance, the biased sample variance and the unbiased sample variance.
The population variance (σ2) is calculated if you have data from the entire population. The calculation goes as the example used here in Table 16. If the data is from a sample of the population and you use the same calculation it is known as the biased sample variance (s2n). It is called biased because it will most likely be too low in comparison to the population variance. Intuitive this can be explained by imagining a very small sample of 10 students at a university. The variance in age will probably be small, while if I took a large sample of 1000 students the variance will probably be higher because the chances I have also a few part-time students in this sample will be higher. The population would be the largest sample possible with an ever higher variance. Because we know already that the biased sample variance is slightly too low, we can adjust for this by dividing not by the number of items (n), but by dividing by the number of items minus 1 (n – 1). In our example this would yield 40 / (5 – 1) = 10. This is known as the unbiased sample variance (s2 or s2n-1). An example that the unbiased sample variance is indeed unbiased can be found in appendix F., and a formal proof is online available from Yamauchi & Aikouka (2011).
Most textbooks don’t mention the biased sample variance, and simply imply the unbiased sample variance when talking about the (sample) variance.
To make things even worse the square root of the unbiased sample variance, will actually still be slightly biased and hence not be an unbiased sample standard deviation. An approximation of an unbiased sample standard deviation, if the population is normally distributed, can be obtained by dividing by n - 1.5. For more information on the unbiased sample standard deviation, see this Wikipedia article.
MAD or SD?
The standard deviation can be seen as an indication of how far away each value was from the mean on average. The MAD is exactly the average distance from the mean, so why use the standard deviation?
The first reason is that squaring is mathematically easier to work with than taking the absolute value. A second reason is that it emphasizes larger values (so a large deviation counts heavier). There are some other reasons as well, and a compelling argument for using MAD instead of SD is given by Stephan Gorard (2004, 2005). Note that the standard deviation is NOT the same as the standard error.
History
The term standard deviation can be found in a paper from Pearson: "Then σ will be termed its standard deviation (error of mean square)" (1894, p. 80), but the idea was already known. It was Fisher (1918) who used the term variance: "...to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance..." (p. 399). He did not use a special symbol for it and simply used σ2. Kenney (1940) comment on the division by n - 1: "This factor is sometimes called "Bessel's correction." Perhaps it should be attributed more appropriately to Gauss who made use of it, in this connection, as early as 1823." (p. 125). The factor they refer to is the factor N / (N - 1).
References
Fisher, R. A. (1918). The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Transactions of the Royal Society of Edinburgh, 52, 399–433.
Gorard, S. (2004). Revisiting a 90-year-old debate: the advantages of the mean deviation. Presented at the British Educational Research Association Annual Conference, Manchester. Retrieved from http://www.leeds.ac.uk/educol/documents/00003759.htm
Gorard, S. (2005). Revisiting a 90-year-old debate: the advantages of the mean deviation. British Journal of Educational Studies, 53(4), 417–430. doi:10.1111/j.1467-8527.2005.00304.x
Miller, J. (n.d.). Earliest Known Uses of Some of the Words of Mathematics. Retrieved from http://jeff560.tripod.com/mathword.html
Pearson, K. (1894). Contributions to the Mathematical Theory of Evolution. Philosophical Transactions of the Royal Society of London. A, 185, 71–110. doi:10.1098/rsta.1894.0003
Yamauchi, H., & aikouka, N. kenkyu. (2011, September 17). Bessel’s correction. Retrieved from https://docs.google.com/file/d/0BwxORBGgApGaNjZjZDgyNmQtY2FkNC00Y2NjLTlhMDAtZjg5M2E4M2E5NTQw/edit?hl=en
You are doing a great job by sharing useful information about Data Science course. It is one of the post to read and improve my knowledge in Data Science.You can check our What is the Sample Variance, information for more information about Sample Variance tutorial.
BeantwoordenVerwijderen