Formal statistics is important to modern computer science (CS) in two ways. First, it forms the basis for the rapidly growing fields of pattern recognition and machine learning. Its contribution here is clearly recognized. Many important results can be equally considered contributions to statistical theory and machine learning, and the leading researchers are recognized in both communities. A second contribution of statistics to CS is much broader, but, paradoxically, much less respected. Many results in CS depend on the accurate analysis of empirical data. The performance of a new retrieval algorithm, the throughput of a novel network protocol, the interpretation of a complex simulation, and the usefulness of a new human interface mechanism are only a few examples of the kinds of claims that rest on data analysis. All too often, researchers are cavalier in their approach to analyzing such data. Common errors--such as not being aware of the impact of outliers on such estimates as the mean, or the fact that the *t*-test is inappropriate for nonnormal data--are widespread and compromise the ability to assess the value of a wide range of research.

Ideally, a good grounding in statistics should be part of the training of every computer scientist. Realistically, in a field that has grown so large, many students will move into the workplace with only a smattering of understanding and will need help solving one or another particular problem of statistical analysis. Even those with good grounding in the discipline may sometimes need a practical hand with a specific analytical problem.

Since its first edition in 1997 [1], Sheskin’s encyclopedic compendium has been a helpful guide to the perplexed. It is unashamedly a cookbook, offering a catalog of different statistical tests--mostly inferential, but also including descriptive measures of association and correlation. Purists will fault this orientation as taking tests out of their theoretical context, but the book’s 124-page introduction provides a helpful summary of the underlying principles and vocabulary, giving users some perspective of the tests they use.

The great value of the book is in its organization. The basic theme is: “If you have this kind of data and want to draw this kind of conclusion, use this test.” The decision process for selecting a test begins with the nature of the data as interval-ratio, ordinal, or nominal, a distinction that has invited some recent criticism but is still useful as a quick-and-dirty guide. Next, the user is asked to consider whether there is a single sample, two samples, or two or more samples and, where multiple samples are concerned, whether they are independent or not. Not until then is the user confronted with the question of what kind of hypothesis is being considered.

The discussion of each test follows a standard outline. An introductory section describes the hypothesis being evaluated and discusses the assumptions underlying the test and other related procedures. Reading this section alone would help avoid much confusion in the experimental literature. Then, the author offers an example; a description of the null and alternative hypotheses; discussions on how to conduct the computations that the test requires and how to interpret the results; and supplementary material, such as additional analytical procedures, further discussion of the test--both theoretically and in comparison with related tests, and additional examples. Each test offers references for further reading.

The book has grown steadily from 719 pages in the first edition to over 1,700 in this fourth edition. While one can find a test for almost any statistical question, not every test is listed--of nine tests for heteroskedasticity listed in Wikipedia [2], Sheskin discusses only two, plus a third one that is not listed in Wikipedia. But the text does point to review articles that cover 56 such tests.

This volume is an invaluable desk reference that, if consulted, should greatly increase the appropriateness of the experimental results on which much of CS relies. Its detailed discussions of both statistics in general and individual tests will hopefully encourage computer scientists to learn more of the underlying theory that makes these tests meaningful.