woensdag 30 april 2014

Introduction to blog

At the moment I'm slowly building a website for Statistics. The idea is that the site describes the various terms in a clear and efficient way, including references and examples.

On this blog I'll upload some parts of it already so others can already make use of the content and comment and give suggestions for further improvements.

dinsdag 29 april 2014

1.1. Decimal marker and rounding

Dot or comma
Belief it or not, a whole range of symbols is used as a decimal marker and thousands separator. The dot under (.) and the comma (,) are probably the most widely known, but also the dot middle (∙), dot above (˙) or comma above (') are used in various countries.

The International Bureau of Weight & Measures, has noted that only the dot or comma should be used as a decimal separator (Bureau International des Poids et Mesures, 2003, p. 169). To separate thousands only a space should be used and not a dot or comma (Bureau International des Poids et Mesures, 1948, p. 70, 2003, p. 169).

So the only two ways to write down seven million and a half are 7 000 000,5 or 7 000 000.5 (or without the spaces).

Rounding
The rounding used here is the ‘round half away from zero’. If we want to round to two decimals, this method means we look at the third decimal. If the third decimal is 5 or higher (i.e. 5, 6, 7, 8 or 9) the second decimal gets increased by one. If the third decimal is lower than 5 (i.e. 0, 1, 2, 3, or 4) the second decimal remains unchanged.

1.375999 ≈ 1.38
1.374999 ≈ 1.37
-1.375999 ≈ -1.38
-1.374999 ≈ -1.37

This is the method also described by the National Center for Education Statistics (NCES, 2012).

>>Next section: Cases, questions, variables and values

References
Bureau International des Poids et Mesures. (1948). RĂ©solution 7 (pp. 70–71). Presented at the 9e Conférence Générale des Poids et Mesures, Paris: BIPM.
Bureau International des Poids et Mesures. (2003). RĂ©solution 10. Presented at the 22e Conférence Générale des Poids et Mesures, Paris: BIPM. Retrieved from http://www.bipm.org/fr/convention/cgpm/comptes_rendus.html
NCES. (2012). Analysis of Data / Production of Estimates or Projections. In 2012 NCES Statistical Standards. Retrieved from http://nces.ed.gov/statprog/2012/pdf/Chapter5.pdf

maandag 28 april 2014

1.2. Cases, variables, values and scores


When collecting data, you collect it from something. This something could be people, things, days, etc. and are called the cases. From each case we like to know different things that can vary per case, the things we want to know from each case are called the variables. Each variable can vary, the possible variations are the values. Finally the specific value of a variable for a case is called a score.

A survey for example has as cases people (or respondents) and the questions on the survey can be considered the variables. A small difference between a survey question and a variable, is that the name of the variable is often shorter. The question ‘What is your gender?’ will have as a variable simply ‘Gender’ and the possible variations are Male and Female. My score for the variable gender would be Male.

In Table 1 a classification of types of questions is shown.

Table 1
Types of questions
Type Description Example
1
Open questions
1a
Open questions asking for a number
What is your age? ___
1b
Open questions asking for text
What is your name? ____
2
Closed questions
2a
Closed question, single answer
What is your gender?
O Male  O Female
2b
Closed question, multiple answer
Which TV series do you watch?
□ Got  □ Walk. Dead  □ Southpark □ Other
3
Semi-closed
3a
Semi-closed, single answer
What is your favourite brand?
O Nike O Adidas
O Puma O Other, please specify:____
3b
Semi-closed, multiple answer
Which brands do you like?
  Nike □  Adidas
  Puma □  Other, please specify:____
4
List of questions on same scale
Please indicate the level you agree or disagree with the following statements.
Strongly disagree
Disagree
Agree
Strongly agree
I enjoy this book
O
O
O
O
The book is clear
O
O
O
O
The layout is nice
O
O
O
O
The style is good
O
O
O
O

If type 4 uses some form of scale for each of its questions (as in the example), then this type is also known as a Likert [1] scale (Likert, 1932). Note that each question from the Likert scale is called a Likert item, the combined set of the questions is then the Likert scale.
Questions with multiple answers (type 2b and 3b) are often not considered as a variable, but a collection of variables. Each option is considered a yes/no question. In the example of the TV shows the multiple answer question, could also have been rephrased using the following four single answer questions:
  • Do you watch Got? O Yes O No
  • Do you watch Walk. Dead? O Yes O No
  • Do you watch Southpark? O Yes O No
  • Do you watch Other? O Yes O No
How to enter questions in SPSS can be found in Appendix A.

>>Next section: Measurement levels
References
Likert, R. (1932). A technique for the measurement of attitudes (Vol. 22). New York: The Science Press.

[1] Pronounced with a short i as in sick, and not as in lie.

zondag 27 april 2014

1.3. Measurement level

Each variable has also a measurement level. These are used to classify the variables and depending on the type of measurement level different statistical techniques can be used. The classification described by Stevens (1946) is the one most frequently used and also described here.

Nominal has ‘no’ in it. It is used for variables that have no logical order (besides perhaps alphabetical). An example of a variable on a nominal level is TV shows. The order we put the TV shows in, does not really matter.
For variables on nominal level we can only compare groups based on the quantity.

As the name implies, for ordinal variables there is a logical order, but note that numbers are not yet being used. An example is Educational Level. There is a clear order in this.
Besides comparing the different groups, we can now also make statements about greater or less (e.g. 45 respondents had secondary school or less).

Interval is for variables that are using numbers, but the zero of the scale was somewhat arbitrary chosen. For example temperature in degrees Celsius (or Fahrenheit) is at an interval level. The zero was somewhat arbitrary chosen as the boiling point of water (and later flipped to be the freezing point).
This level is called ‘interval’ because there are equal intervals between values. The difference between 5 degrees Celsius and 10 degrees, is the same as the difference between 20 and 25. Note that for ordinal variables this is not the case, the difference between Strongly disagree and Agree, might be different than the difference between Agree and Neutral.
At an interval level the zero does not really mean nothing. For example at 0 degrees Celsius there is still a temperature, and also someone with an IQ of 0, still has intelligence (but nothing much). This means we can also add and subtract the values on an interval level.

The last level is ratio, for variables that are using number, and have a true zero. This means that zero really means nothing. If your income is zero, you are really not earning anything. Ratio variables are also relative speaking equal (hence the name ratio). If you earn 200 one day, and 400 the next than this is really twice as much (no matter which currency), while for example 10 degrees Celsius is not twice as much as 5 if you would convert them to Fahrenheit. There is one temperature scale that is considered ratio, which is Kelvin who uses the absolute zero as a starting point.
At ratio level we can now also divide and multiply.

Sometimes no distinction is made between nominal and ordinal, and the term categorical is used. Similar SPSS does not make a distinction between interval and ratio and simply calls these scale.

Another way to classify variables is into discrete and continuous. A discrete variable is a variable where the values are distinct from each other. This means that both nominal and ordinal variables will also be discrete. However interval and ratio variables could either be continuous or discrete. A continuous variable is if there are no interruptions. The number of people is a discrete variable, but also interval. Money, weight and length are examples of continuous variables.

Note that the measurement level classification from Stevens is not without criticism. Velleman & Wilkinson (1993) give a nice overview of various problems with Stevens classification. For those interested in reading more on measurement levels the article from Sarle (1997) might be a nice starting point.

>>Next chapter: Descriptive Statistics

References
Sarle, W. S. (1997, September 14). Measurement theory: Frequently asked questions. Retrieved May 3, 2015, from ftp://ftp.sas.com/pub/neural/measurement.html

Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, 103(2684), 677–680. doi:10.1126/science.103.2684.677
Velleman, P. F., & Wilkinson, L. (1993). Nominal, Ordinal, Interval, and Ratio Typologies are Misleading. The American Statistician, 47(1), 65–72. http://doi.org/10.1080/00031305.1993.10475938 

dinsdag 1 april 2014

Privacy Policy

Privacy Policy for statisticsbypeter.blogspot.nl 

If you require any more information or have any questions about our privacy policy, please feel free to contact us by email at stikpet@gmail.com. 

At statisticsbypeter.blogspot.nl, the privacy of our visitors is of extreme importance to us. This privacy policy document outlines the types of personal information is received and collected by statisticsbypeter.blogspot.nl and how it is used. 

Log Files
Like many other Web sites, statisticsbypeter.blogspot.nl makes use of log files. The information inside the log files includes internet protocol ( IP ) addresses, type of browser, Internet Service Provider ( ISP ), date/time stamp, referring/exit pages, and number of clicks to analyze trends, administer the site, track user’s movement around the site, and gather demographic information. IP addresses, and other such information are not linked to any information that is personally identifiable. 

Cookies and Web Beacons 
statisticsbypeter.blogspot.nl does use cookies to store information about visitors preferences, record user-specific information on which pages the user access or visit, customize Web page content based on visitors browser type or other information that the visitor sends via their browser. 

DoubleClick DART Cookie 
.:: Google, as a third party vendor, uses cookies to serve ads on statisticsbypeter.blogspot.nl.
.:: Google's use of the DART cookie enables it to serve ads to users based on their visit to statisticsbypeter.blogspot.nl and other sites on the Internet. 
.:: Users may opt out of the use of the DART cookie by visiting the Google ad and content network privacy policy at the following URL - http://www.google.com/privacy_ads.html 

Some of our advertising partners may use cookies and web beacons on our site. Our advertising partners include ....
Google Adsense
Amazon


These third-party ad servers or ad networks use technology to the advertisements and links that appear on statisticsbypeter.blogspot.nl send directly to your browsers. They automatically receive your IP address when this occurs. Other technologies ( such as cookies, JavaScript, or Web Beacons ) may also be used by the third-party ad networks to measure the effectiveness of their advertisements and / or to personalize the advertising content that you see. 

statisticsbypeter.blogspot.nl has no access to or control over these cookies that are used by third-party advertisers. 

You should consult the respective privacy policies of these third-party ad servers for more detailed information on their practices as well as for instructions about how to opt-out of certain practices. statisticsbypeter.blogspot.nl's privacy policy does not apply to, and we cannot control the activities of, such other advertisers or web sites. 

If you wish to disable cookies, you may do so through your individual browser options. More detailed information about cookie management with specific web browsers can be found at the browsers' respective websites.