donderdag 29 mei 2014

2.1.3. Frequencies

Once we have data we often like to know 'how many...', which can be answered by generating frequency tables. The basic frequency simply counts how many data points belong to a specific group. These are known as the absolute frequencies.

In some instances it is also useful to report some other type(s) of frequency. The main three types are the relative frequency, cumulative frequency and frequency density.

Relative Frequency
The relative frequency is the frequency of a class/group in relation to the total frequency. It is defined as: "[absolute frequency] expressed as a fraction of the total frequency" (Kenney & Keeping, 1954, p. 17).

Often the relative frequency gets multiplied by 100 to yield the percentage.

Cumulative Frequency & Inverse Cumulative Frequency
The cumulative frequency keeps adding up (accumulate) and shows the number of cases that are equal or lower than the class/group. It is defined as: "the total (absolute) frequency up to the upper boundary of that class" (Kenney, 1939, p. 16) and the inverse cumulative frequency is the exact opposite: “the frequency of all values greater than the lower class boundary of a given class” (Kozak, Kozak, Staudhammer, & Watts, 2008, p. 13).
Since the cumulative frequency is interpreted with "equal or less than", and the inverse cumulative with "equal or more than", the variable should have a logical order, and hence cannot be used if the variable is of nominal level.

Frequency Density
As the name implies the frequency density shows how dense a class is that has been binned (e.g. 0< 10, 10 < 30, etc.), or in other words how crowded it is. It goes similar as population density (which is population divided by area), which means it can easily be determined by dividing the (absolute) frequency by the class width. 
Pearson (1895, p. 399) does not mention the term frequency density, but does mention that in histograms the area of the columns should equal the (absolute) frequency. Since the width of a column in a histogram is equal to the class width, and an area of a column is equal to the width times height, it is simple to deduce that the height should be the absolute frequency divided by the class width.
The frequency density is especially useful in cases when class widths are not equal for all classes.
Since the frequency density needs a class width it can only be determined for interval and ratio variables.
In some cases many classes have the same width except a few. Some authors (e.g. Barrow, 2009; Burton, Carrol, & Wall, 2002; Haighton, Haworth, & Wake, 2003) suggest then to set a ‘standard class width’ and determine the frequency density based on the standard class width.

Combinations
Combinations are also possible:
  • Relative Frequency Density
  • Cumulative Frequency Density
  • Cumulative Relative Frequency
  • Cumulative Relative Frequency Density
  • Inverse Cumulative Frequency Density
  • Inverse Cumulative Relative Frequency
  • Inverse Cumulative Relative Frequency Density.
The relative frequency density can be determined using the same reasoning as for relative frequencies, and simply divide the frequency density by the total frequency (Haighton, Haworth, & Wake, 2003, p. 74), or using the same reasoning as for frequency density itself and dividing the relative frequency by the class width (Kozak et al., 2008, p. 80). Both approaches will yield the same result.
Note that according to Petry & Friesen (2012) the cumulative relative frequency density is pointless to calculate. This is a bit strange because especially for histograms showing probabilities, these are frequently used.

Example with some interpertation

  • FD (Frequency Density)
    From the FD column we can see that although the 50 < 100 class has the most people in absolute frequencies, if we take the class width into consideration the 10 < 20 class is the most crowded (highest FD).
  • RF (Relative Frequency)
    If we would want to compare our data with another set of data that used the same class widths, but a different amount of cases, we could compare the RF's with each other to see where relatively speaking the most people would fall in absolute terms.
  • RFD (Relative Frequency Density)
    If we wanted to compare our data with another set of data that used different class widths we could compare the RFD's to compare which classes were most crowded.
  • CF (Cumulative Frequency)
    We can see immediately that 22 cases would fall in 0 < 50.
  • ICF(Inverse Cumulative Frequency)
    We can see immediately that 21 cases would fall in 10 < 100.
It possible to construct formulas for each type of frequency, but they often look scarier than the calculation actually is. However if you like you can find the formulas with an example calculation in Appendix C.

>>Next entry: Tables with Excel, SPSS or a TI-83

References
Barrow, M. (2009). Statistics for Economics, Accounting and Business Studies (5th edition.). Essex: Pearson Education.
Burton, G., George Carrol, & Wall, S. (2002). Quantitative Methods for Business and Economics (2nd edition.). Essex: Pearson Education.
Haighton, J., Haworth, A., & Wake, G. (2003). Statistics. Cheltenham: Nelson Thornes.
Kenney, J. F. (1939). Mathematics of Statistics; Part one. London: Chapman & Hall. Retrieved from http://archive.org/details/MathematicsOfStatisticsPartI
Kenney, J. F., & Keeping, E. S. (1954). Mathematics of Statistics; Part one (3rd edition.). New York: D. Van Nostrand Company. Retrieved from http://hdl.handle.net/2027/mdp.39015015725339
Kozak, A., Kozak, R. A., Staudhammer, C. L., & Watts, S. B. (2008). Introductory Probability and Statistics: Applications for Forestry and Natural Sciences. Wallingford: CABI.
Pearson, K. (1895). Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society of London. (A.), 186, 343–414. doi:10.1098/rsta.1895.0010
Petry, R. G., & Friesen, B. (2012). STAT 100; Elementary Statistics for Applications. Campion College. Retrieved from http://amberlin.asuscomm.com/university_of_regina_copies/stat_100_lecture_notes_v2/intro_stats_v2.pdf

Geen opmerkingen:

Een reactie posten