What is it?
The  mode is the most common value obtained in a set of observations (Weisstein,  n.d.). 
If two or  more items have the highest occurrence, they are all the mode. If each value  occurs equally often, there is no mode. 
Visually the mode (or  modes) can be seen by the highest peak(s), as illustrated in the animated gif below (Figure 18).
Figure 18. Visual representation of the mode.
If there is only one mode the term unimodal is sometimes used, when there are two the term bimodal, with three trimodal, etc. Alternatively the term multimodal is also sometimes used when there are two or more modes. 
Strictly speaking the items with the highest frequencies are the mode (unless they are all equal), but in the example below (Figure 19) it is clear that there are two peaks, so one could argue that there are two modes.
Figure 19. Example of a data set with two local maximums.
In the above example the mode is 2, but 8 is also somewhat of a peak. This is known as a local maximum, and those are often considered to be part of the mode as well. The Oxford Dictionary of Statistics even considers any class whose neighboring classes frequency are lower to be a modal class (Upton & Cook, 2014, pp. 272–273). There are even some statistical tests to check if the data can be considered bimodal. One of these is discussed as a side note.
Examples
On a survey ten people were asked to rate the design of the product on a scale of Very nice, Nice, Ugly, Very Ugly. The responses the researcher got were: Very Nice, Nice, Nice, Nice, Nice, Ugly, Ugly, Ugly, Very ugly, Very ugly. Since only Nice has a frequency of four and all others occur less often the mode is Nice. In this case the data is unimodal.
The following grades were obtained by various students: 2, 4, 5, 5, 5, 8, 9, 9, 9. Since five and nine are the only grades that appear three times, the others only once and no grade appears four or more times, the mode is 5 and 9. In this case the data is bimodal. Note that you do NOT average the two.
In a group of 6 students the genders are: Male, Male, Female, Male, Female, Female. Since both male and female occur three times there is no mode.
A special case is when data is grouped in classes. The class (or classes) with the highest frequency density is (or are) then considered the modal class (Kenney & Keeping, 1954). 
Why (not) use the mode?
The main advantage of the mode is that it is the only measurement of central tendency for variables on a nominal measurement level (e.g. Gender).
The disadvantage of the mode is that it ignores all other values. If we have a 2, 2, 70, 80, 90 , 100 the mode would be 2, but all other values are a lot higher. The 2 then does not really represent the center very well, making the mode not a good measure of central tendency.
Note that the often heard modal income, might vary slightly from the true statistical meaning. Check the source when reporting modal income, how they define it. For example the Dutch ‘Centraal planbureau’ calculates the modal income differently than simply looking at the modal class.
Debate
There are two notes about determining the mode. The first is in the case where all values have the same frequency (a so called uniform distribution). Most textbooks would say that in those cases there is no mode, but there are a few that will mention that all items are then the mode.  
The second is in the case of multiple modes. There are a few textbooks that might argue since not a single item has the highest frequency, there is no mode (e.g. Johnson & Kuby, 2011, p. 66).
History
The earliest found reference to the mode can be found in the work from Pearson: "I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency. Thus the "mean," the "mode," and the "median" have all distinct characters." (1895, p. 345).
The word mode is derived from the French which means ‘fashion’ (which was in itself probably from the latin modus). So it started with asking ‘what is fashionable’, which was what are most people wearing.
Strong (1901) mentions the term bimodal “…is seen to be distinctly bimodal” (p. 286).
>>Next section: Median 
References
Johnson, R., & Kuby, P. (2011). Elementary Statistics. Cengage Learning.
Kenney, J. F., & Keeping, E. S. (1954). Mathematics of Statistics; Part one (3rd edition.). New York: D. Van Nostrand Company. Retrieved from http://hdl.handle.net/2027/mdp.39015015725339
Pearson, K. (1895). Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society of London. (A.), 186, 343–414. doi:10.1098/rsta.1895.0010
Strong, R. M. (1901). A Quantitative Study of Variation in the Smaller North-American Shrikes. The American Naturalist, 35(412), 271–298.
Upton, G. J. G., & Cook, I. (2014). A dictionary of statistics (3rd ed.). Oxford: Oxford University Press
Weisstein, E. W. (n.d.). Mode from MathWorld. Text. Retrieved April 5, 2014, from http://mathworld.wolfram.com/Mode.html