dinsdag 27 mei 2014

2.1.5. Side notes on tables and frequencies

A few side notes related to tables and frequencies.

Tally marks
The absolute frequency is the count of the number of items belonging to a class or group. To do this counting, a tally mark system can be used. If we consider tally marks as part of the table, and tables as part of statistics, one might consider the Lembobo Bone as one of the oldest known statistics in history. The Lembobo bone (Figure 1) dates approximately 35,000 BC (Bogoshi, Naidoo, & Webb, 1987).


Figure 1. The Lembobo bone. Reprinted from Primitive numbers and a history of counting, by S. Chavda, 2011, Retrieved from http://www.swatichavda.com/2011/10/primitive-numbers-history-of-counting.html.

As for tally systems, there are a few popular ones across the globe, illustrated in Table 5:

Table 5
Tally systems



Simpson's paradox
Simpson (1951) describes the following situation: A person is interested in the proportion of court cards (King, Queen and Knave) as well as the clean vs. dirty cards. He checks two decks of cards a red deck and a black deck and finds the proportions as shown in Table 6 and Table 7.
 
Table 6
Card Distribution Among Dirty Cards
Dirty cards
Court
Plain
Red deck
4/12
8/12
Black deck
3/8
5/8

Assuming we prefer court cards, we can conclude that from the dirty cards the black deck is preferred (higher percentage of court cards than the black deck. Now let's look at the clean cards results:

Table 7
Card Distribution Among Clean Cards
Clean cards
Court
Plain
Red deck
2/14
12/14
Black deck
3/18
15/18

Also with the clean cards the black deck would be preferred. However, if we combine the dirty and clean cards we get the proportions shown in Table 8.

Table 8
Card Distribution Among All Cards 
All cards
Court
Plain
Red deck
6/26
20/26
Black deck
6/26
20/26

And now all of a sudden both the red and black deck is equal. This phenomenon is known as Simpson's paradox.

Others had noticed the same idea already earlier than Simpson did (Cohen & Nagel, 1934; Pearson, Lee, & Bramley-Moore, 1899; Yule, 1903). Blyth (1972) was perhaps the first to call it ‘Simpson’s paradox’, but should probably have called it Yule’s paradox. For more examples you can have a look at the Wikipedia entry and for a more technical explanation and details Pearl's (2013) article.

Benford's Law
If you look at a long list of real data numbers and would only look at the first digit (the most left one), it turns out that in 30.1% of the values it will be a 1, even though we might have expected it to be in only 1/9 = 11.1%. This is known as Benford's Law. Benford's (1938) article was published in 1938, but actually Newcomb (1881) had already discovered this in 1881.

>>Next section: Diagrams

References
Benford, F. (1938). The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78(4), 551–572.
Blyth, C. R. (1972). On Simpson’s Paradox and the Sure-Thing Principle. Journal of the American Statistical Association, 67(338), 364–366. doi:10.1080/01621459.1972.10482387
Bogoshi, J., Naidoo, K., & Webb, J. (1987). The oldest mathematical artifact. The Mathematical Gazette, 71(458), 294.
Cohen, M. R., & Nagel, E. (1934). An introduction to logic and scientific method. New York: Harcourt, Brace and company.
Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics, 4(1), 39–40.
Pearl, J. (2013). Understanding Simpson’s Paradox (SSRN Scholarly Paper No. ID 2343788). Rochester, NY: Social Science Research Network. Retrieved from http://papers.ssrn.com/abstract=2343788
Pearson, K., Lee, A., & Bramley-Moore, L. (1899). Mathematical Contributions to the Theory of Evolution. VI. Genetic (Reproductive) Selection: Inheritance of Fertility in Man, and of Fecundity in Thoroughbred Racehorses. Royal Society of London. Retrieved from http://archive.org/details/philtrans07768035
Simpson, E. H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, 13(2), 238–241.
Yule, G. U. (1903). Notes on the Theory of Association of Attributes in Statistics. Biometrika, 2(2), 121–134. doi:10.1093/biomet/2.2.121

Geen opmerkingen:

Een reactie posten