zaterdag 31 januari 2015

Is this percentage different from ...? The Binomial Test - Basics

A binomial test can be used to test if a percentage from a sample is significantly different from another percentage. For example: An advertisement claims that 80% of the respondents said that they would keep on using the product. You would like to test this claim and ask 5 people yourself. Only 3 of your respondents mention they would keep on using the product.

The question is now if based on your data you can say the claim is likely to be correct or not?

The exact binomial test was first described by Bernoulli (1713). The number of calculations increase quickly when the sample size becomes bigger. Before the computer did the calculations these calculations could be avoided by using some form of an approximation. The null hypothesis tested is as shown in the figure below.


In the example this is therefor: Ho: π = 0.80.

Please note that I use the Greek letter pi for the population proportion, and p for the sample proportion, while some other textbooks use p for the population proportion and p-hat for the sample.

As for the alternative hypothesis (Ha) there are three variations:
  1. One-sided left-tailed
    In words: The probability of success in the population is less than …
    In symbols: Ha: π < πH0
  2. One-sided right-tailed
    In words: The probability of success in the population is more than …
    In symbols: Ha: π > πH0
  3. Two sided
    In words: The probability of success in the population is not …
    In symbols: Ha: π ≠ πH0
In the example we would probably be interested in the first one (if the actual percentage turns out to be more than 80% we wouldn’t really get angry at them):
Ho: π < 0.80.

Now we need to decide on how we are going to test this. It might surprise you that there are actually various tests that could be used, and there is no general agreed upon method to decide which test to use. More on this in the next section.

The basic binomial test will look at how many possible combinations there are to get the same or less successes, determine the probability for each and add them all up. It does this by using the following formula.



An explanation of where this formula comes from can be found in Appendix I . To find the probability of getting a result or more extreme as in our sample we can either:
  • Perform the calculation by hand using the formula
  • Use a table to find the corresponding probability
  • Use a software package
  • Use an online calculator (e.g. StatTrek, VasSarStats, or GraphPad)

Each of the above methods needs three values for the input: The sample size (usually denoted by n, in the example n = 5) or sometimes called number of trials, the probability for success on each trial (usually denoted by π, in the example π = 0.8), and the number of success found in the sample (usually denoted by k, in the example k = 3).

We are interested in the probability of obtaining the same or more extreme output. This is why we look at ‘cumulative’ probability tables, or in the output look for P(X ≤ …). In the example all methods should yield a p-value of  0.2627 (26%). This is the probability of obtaining 3 or less out of 5 in a sample, if the probability for success on each trial in the population is 0.80.

In reporting the results of a binomial test the proportion in the sample, the assumed proportion and the significance of the test should be reported, as well as an interpretation. Our final conclusion could read something like:
Advertiser X claimed that 80% of their respondents would like to continue using their product. A sample of five people was selected to test this claim. Only three people in the sample indicated they would like to continue using the product. A one-tailed binomial test was performed to test if the claim could be rejected. The test resulted in p = .263 indicating that there is insufficient evidence to reject the claim.

References
Bernoulli, J. (1713). Ars conjectandi. Impensis Thurnisiorum, fratrum.

vrijdag 30 januari 2015

Is this percentage different from ...? The Binomial Test - Variations

Why not always use an ‘exact’ test? Well, computational the exact test takes longer, and is sometimes criticized for being too ‘conservative’. This is because the exact (binomial) test is a discrete distribution (only integers), such as our number of successes, but the percentages themselves are continuous. Agresti & Coull (1998) wrote an article on confidence intervals and show that ‘approximate’ is better than ‘exact’, but McDonald (2014) argues that almost always an exact-test is preferred as long as the sample size is less than 1000.

The binomial exact test can be approximated by a: Chi-square-, normal- and Poisson distribution. If your probability of success in the population is very small, the Poisson distribution will be most appropriate. The difference between Chi-square and Normal approximation is very small (and in some cases even zero). The Normal approximation is often covered in textbooks (with or without a continuity correction), so more people are familiar with it.

If you opted for a Chi-square based approximation, you can choose either a Pearson Chi-square or a G-test (also known as likelihood ratio). According to Özdemir & Eyduran (2005) choosing between these two should be done based on the power test concept. A correction to each of these two can also be applied: Yates, Williams, or E. Pearson. The Yates correction has often shown to over-correct and is highly criticized (Thompson, 1988), Campbell (2007) recommends the E. Pearson’s.

If you have a two sided test, there is another choice to be made. A two sided test means you look for the same or more extreme in either direction. There are various methods that can be used to determine what is considered ‘more extreme’ in the other direction.

The easiest method is the Fisher-Irwin method, which simply doubles the one-sided significance value.

Another approach is the ‘equal distance’. In the example from the previous section we would have expected 0.8 x 5 = 4 successes. We only had 3, so a difference of 4 – 3 = 1. Taking an equal distance in the other direction would indicate 4 + 1 = 5 or more successes.

A third approach is, the Freeman-Halton approach (G. H. Freeman & Halton, 1951). They check the probability for each possible number of success above the one from the sample and only add the probabilities of those that are lower or equal to the one in the sample (known as the method of small p values).

Choosing among all of these options can be tricky. There are several opinions and an equal amount of rules of thumb. The choice could depend on a variety of things. The easiest would be if your instructor has his/her own personal preference, so he/she might have explicitly informed you that you should use a specific test. Another option that might make the choice easy is if you are using software that can only perform one or a few tests. The first choice to make is if you want to use a so-called ‘exact’ test, or use an approximation. If your sample size is small, none of the approximations will be accurate, so then an Exact test is your only option.

My personal suggestion at the moment would be the following: Level 1: If you want 95% confidence check if nπ(1-π)≥10, or if you use 99% confidence if nπ(1-π)≥35. If it is, use the normal approximation. This is based on Ramsey & Ramsey (1988) recommendation. If not, go to level 2
Level 2: If p < 0.05 (or > 0.95) and n > 20 use a Poisson approximation. If not use a Binomial exact test with the Freeman-Halton approach in case you want it two sided.

References
Agresti, A., & Coull, B. A. (1998). Approximate Is Better than “Exact” for Interval Estimation of Binomial Proportions. The American Statistician, 52(2), 119–126. http://doi.org/10.2307/2685469 
Campbell, I. (2007). Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statistics in Medicine, 26(19), 3661–3675. http://doi.org/10.1002/sim.2832 
McDonald, J. H. (2014, December). Small numbers in chi-square and G–tests. Retrieved December 25, 2014, from http://www.biostathandbook.com/small.html 
Özdemir, T., & Eyduran, E. (2005). Comparison of Chi-Square and Likelihood Ratio Chi-Square Tests: Power of Test. Journal of Applied Sciences Research, 1(2), 242–244.