Book Review: "Data Analysis for Scientists and Engineers by Edward L. Robinson (Princeton University Press, 2016)"
In any discipline that relies on experiment and measurement, the proper analysis and interpretation of data is of paramount importance. Education in science and engineering disciplines has not always reflected that fact, and it is not too long ago that courses in probability and statistics were missing from the curriculum in some areas of physical science. It is good to see that this is changing.
This book is a graduate-level introduction to data analysis aimed at scientists and engineers. It has many strengths, but also some weaknesses which I shall touch on below after first briefly summarizing the contents.
As is the case with most texts on this topic, the book begins with an introduction to the basics probability theory after which some familiar probability distributions (e.g. Gaussian, Poisson, Binomial, etc.) are introduced. There is then a chapter about random number generators, which also serves to introduce the idea of Markov Chains and related matters such as Gibbs Sampling.
The rest of the book is devoted to introductory statistics approached from the frequentist perspective in which probabilities are interpreted as proportions defined over some sort of ensemble, e.g. repeated tosses of a coin. There then follow discussions of least-squares estimation and goodness of fit (both linear and non-linear) before the alternative, Bayesian, approach to statistics is described. In the Bayesian approach, a probability is not interpreted as a proportion but as a generalization of the Boolean states of “0” (false) and “1” (true) to an intermediate state p in which there is insufficient information to be sure.
The remaining three chapters of the book are devoted to the analysis of sequences of variables, e.g. Time Series (including spectral analysis, covariance and deconvolution).
The strengths of this book are that it is well produced and well written, with good illustrations and numerous worked examples. I can find little fault with the way the material included is presented and described. The weaknesses of the book are to do with what is left out than what is included.
Surprisingly, there is little to no discussion of hypothesis testing (of either frequentist or Bayesian variety) anywhere in the book, so there is no mention of significance levels, p-values, Type I and Type II errors, and so on. I would have thought one of the aims of a book of this kind would be to introduce the student to concepts much used in the literature, so it is to me a rather glaring omission. Likewise there is nothing in this book about Bayesian inference, model comparison, evidence or any of these very important topics.
Also surprising for a modern textbook there is little discussion of the computational implementation of the methods discussed, and no examples of scripts illustrating how these ideas can be coded up. Data analysis in the age of “Big Data” is very much driven by computers, so this is also surprising. All the modern courses in data analysis in the various universities of which I am aware base their teaching around computational laboratory sessions, so this book would not really work very well as a companion to such a course. It is true that bits of code written in the either of the current industry standards (R or Python) may well have dated in a few years, just as the Matlab scripts of the recent past have now been superseded, but I still think it is useful to include some sample codes as these can easily be translated into other languages if necessary.
The emphasis of the latter chapters of the book on the analysis of random sequences in the context of time-series data perhaps reflects the author’s research interests (in astronomy) which is quite reasonable for a graduate-level text, but the jacket states that there is an “extensive look at analysis techniques for time-series data and images” and there is actually very little on anything other than the one-dimensional case.
The final criticism arises from my personal view that the Bayesian approach to inference offers a far more compelling and coherent way of treating all data analysis challenges than the collection of ad hoc methods generated by frequentist considerations. In accord with that view — which of course you are free to reject — I would like to have seen a far more extensive discussion of Bayesian methods.
The author includes a little discussion at the end of Chapter 7 on Bayesian methods in which he reveals himself as a skeptic. For example, “It appears to your author that the concerns of frequentists about the meaning of probability in Bayesian statistics are legitimate.” It would be inappropriate to debate this issue at length in a book review, but I will say that Bayesian methods have long since been the mainstay of data analysis techniques in astronomy and astrophysics — for what I believe to be very good reasons.
In summary, then, I think that what this book does, it does very well. I do however have serious concerns about the selection of topics, and the absence of material that I would consider essential were I to be teaching a course on this subject.
I will also mention without further comment that the book is priced at £62.95.