Hockey players' birth months from Gladwell (2008).
In his 2008 book Outliers Malcolm Gladwell presents a table showing the birth dates of the members of the 2007 Medicine Hat Alberta hockey players. The table illustrates a clustering of birth dates in the early months of the year:
Insights
- The datagraphic clearly shows an unusually number of first-quarter birth dates, and those toward the end of the year also appear to be quite low.
- Statistical analysis using the chi-squared statistic suggests that this kind of result (or one even more extreme) would be expected about 0.27% of the time if birth months were random; i.e. only about 1 in 378 random samples of 25 people would give such a 'skewed' distribution.
- The datagraphic obviously spoils the investigation that Gladwell invites the reader to undertake, but if it were presented say in an appendix, the unusualness of the distribution would be striking - and the reader would be given an example of applied statistics in the service of the broader "Outlier" argument.
The table from the book
The original table is difficult to interpret. The numbers in the first column may be the players' jersey numbers. Their names don't appear to be relevant to the discussion. Nevertheless, providing the data does give an opportunity - as here - to analyze the data for oneself.
A more picayune point is whether the horizontal lines are necessary. The table also spans 2 pages (I've removed the break) but could be on one; this is true of other tables in the book as well. This placement allows the narrative to refer to the data without the use of "Table 1" etc numbers, which may be off-putting to some readers (?) but is less attractive.
Revisions and discussion
- The months were copied as a simple column of numbers into Excel and then input into the "Histogram" data analysis tool.
- The histogram counts were used as the basis for a "column" (bar) chart that was formatted using my conventions (light-colored plot area, large labels, etc).
- The expected counts were computed and added as a graphic line.
- The chi-squared statistic was computed using Excel's
CHISQ.TEST()
formula.
Source: Gladwell, Malcolm 2008 Outliers NY Little Brown, pp. 20-21.