vrijdag 30 mei 2014

2.1.2. About constructing tables in general


Common sense rules when it comes down to creating tables, but there are a few things each table should include:
  • A clear title for the table.
  • Clear column titles. Each column should be clear to what it shows. 'Income' for example is not very clear. It should mention somewhere either in the table title, column title or as a note that it is income in netto euro's per hour.
  • The source should also be listed. Where did the table come from.
Some formatting styles (such as APA and ASA) have fixed rules on where to place the above mentioned points, as well as which borders are and are not allowed to be drawn.

When creating classes you have to make sure that:
  1. The classes do not overlap (i.e. any value should only be able to fit in one of the classes, technically called mutual exclusive)
  2. Each value should be able to fit in one class (i.e. do not forget one or more classes, technically called inclusive or exhausting)
Since tables in descriptive statistics are used to summarize data and long tables are not so clear, it is often recommended not to create too many classes. There is no fixed rule for this, but between 5 and 20 is often seen as a rule of thumb. Based on tables often diagrams are drawn and those are often used as well to see how the data is distributed (a lot more on distribution will come later). Since the diagram will often depend on how the classes are created some far more technical methods are also available. These go beyond the scope of this blog, but from a few a summary can be found in Appendix B.

There is also some terminology that comes into play when talking about classes (or bins).

Symbols for class interval
There are a few different ways to represent the intervals. To indicate smaller than (but not equal to) the (in my opinion) most clear symbol would be <, and to indicate smaller than or equal to ≤. A more technical method is the use of [ or ] to indicate ‘including’ and ( or ) to indicate smaller than. The interval 2 < 5 is then the same as [2,5), and the interval 2 ≤ 5 is the same as [2,5].

Another symbol often used is a hyphen (-). It is however sometimes used as < is (Chaudhary, Kumar, & Alka, 2009; Sharma, 2007), and sometimes as ≤ is (Beri, 2010; Haighton, Haworth, & Wake, 2003a).

Class limit and Class boundary
If we have classes such as 0-2, 3-5, 6-10 the lower end is called the lower limit and the upper end the upper limit. Note however that a value of 2.498 would go into the first class of 0-2, so the true limits of the classes are actually 0 < 2.5, 2.5 < 5.5, 5.5 < 10.5. These are called the boundaries (the lower one the lower bound and the upper one the upper bound), also known as the true or closed limits. The class boundary is defined as "the value halfway between the upper limit of one class and the lower limit of the next class" (Kenney, 1939, p. 14). In many cases though people are not aware of the difference between boundaries and limits and interchange the words.

Class mark
The class mark is the average of the two limits (or boundaries) of that class. It is also known as midvalue or central value (Kenney, 1939).

This does however create a problem. To illustrate the problems an example. Let’s assume we have created classes on how many books people read a year as 0 < 2, 2 < 4 and so on. I could also write this as 0 ≤ 1, 2 ≤ 3 and so on.
To convert lower class limits into lower class boundaries we normally would simply subtract 0.5 and for the upper ones we add 0.5. For the class 2 ≤ 4 this would then become 1.5 and for the upper limit the boundary becomes 4.5. The class mark is then (1.5 + 4.5) / 2 = 3, however if I use the class limits the class mark would be (2 + 3) / 2 = 2.5. The 2.5 seems more appropriate in my opinion because we are talking about a discrete variable (nr. of books), so the class boundaries are only theoretical.

My recommendation is therefore to use < when the variable that is being grouped is continuous and ≤ when the variable is discrete, and define the class mark as: “The average of the two boundaries in case of a continuous variable that has been grouped and the average of the two limits in case of a discrete variable that has been grouped.”

Class interval and Class width
An interval is a range of values, so the class interval is the range of values that goes into that class. The interval can be stated either with the class limits or boundaries. The class width is how wide the class is and is also known as the class length (Bowerman et al., 2009).

Although frequencies are commonly used in tables, there is a variation of different types of frequencies.

>>Next section: Frequencies

References
Beri, G. C. (2010). Business Statistics (3rd ed.). New Delhi: Tata McGraw-Hill Education.
Bowerman, B. L., O’Connell, R. T., Murphree, E. S., Huchendorf, S. C., Porter, D. C., & Schur, P. J. (2009). Business statistics in practice. Boston, MA.: McGraw-Hill/Irwin.
Chaudhary, K. K. S., Kumar, A., & Alka. (2009). Statistics in Management Studies (10th ed.). Meerut: Krishna Prakashan Media.
Haighton, J., Haworth, A., & Wake, G. (2003). Statistics. Cheltenham: Nelson Thornes.
Kenney, J. F. (1939). Mathematics of Statistics; Part one. London: Chapman & Hall. Retrieved from http://archive.org/details/MathematicsOfStatisticsPartI
Sharma, J. K. (2007). Business Statistics (2nd ed.). Delhi: Pearson Education.

Geen opmerkingen:

Een reactie posten