book reviews
R booksreviewed by T. Nelson |
Reviewed by T. Nelson
Statistics packages, even those that use a graphical user interface, are notoriously clumsy and unintuitive to use, and are often poorly documented. This book uses examples to teach R, a free but powerful command-line based statistics package. A familiarity with basic statistics is assumed.
This book doesn't waste time telling you how to install or configure R. It goes right to its topic, teaching R by example. The teach-by-example method can be very effective. An excellent example is Statistical Analysis: A Decision-Making Approach by Robert Parsons. To make this sort of book useful requires discipline and organization. Data Analysis and Graphics Using R is fairly well organized. It has an extensive index of R functions and statistical topics.
However, there is one big problem: Where are all the examples?? It turns
out that by 'examples', the authors don't mean 'examples of R code,' but
sample statistical problems, as distinct from a theory-based approach.
There is surprisingly little R code in this book. The commands for linear
modeling (which is R's term for linear regression), for instance, are
scattered across several chapters, making it
hard for the reader to piece together the correct syntax. This could have
been avoided by including parts (or all) of the authors' R scripts (such
as lm-tests.R
). This well-written file makes
it immediately obvious how to run a linear model. These are the "examples"
that should have been included in the text. I eventually discovered that
it was much easier to learn R by reading the help pages within R rather
than guess the correct syntax from the text.
A related problem, at least in the early sections, is that some of the examples don't make sense unless you install the authors' DAAG data package. The book also provides relatively little insight as to how R processes the data internally.
On the positive side, topics such as time series analysis and tree-based classification, which are missing from many other books, are thoroughly covered. The authors try to teach some statistics along with R and give many warnings about whether a particular model is appropriate. As might be expected, little mathematical background is provided for the statistical methods.
May 12, 2007
Reviewed by T. Nelson
Why would anyone ever want to program in an interpreted statistical package like R? Well, that's easy: for the same reason people write shell scripts. R already has a vast library of statistical and numerical routines that have already been debugged, and R users find themselves using the same sequence of operations over and over. You might say every significant piece of mathematical software eventually turns into a language.
R has its own plotting functions (see review at left). It has its own debugger and profiler. It even has object oriented programming—Matloff describes S3, the old style, and S4, which is vastly better. R even has functions for clustered and multiprocessor computation, and sockets and TCP/IP for networking.
As an interpreter, it can also do things like this:
>a = "5+4+sqrt(34)"
You can also create R code within R and use the
> eval(parse(text=a))
[1] 14.83095source
function
to execute it, just as we used to do with interpreted BASIC many years ago.
This might seem like a strange thing to do, but it's a powerful way of adding
symbolic processing to R that's under-utilized. (In BASIC you could store an
arbitrary user-entered formula in a big string variable, then jump to it and
execute it!) This is a great feature that quickly becomes essential if you
write spreadsheet software. You can define new operators in R as well.
Matloff also shows how to interface R with C/C++ and Python to increase speed. Now, you might ask, “just how slow is R if Python speeds it up? But that's not the point: lots of scientific software is already written in Python and C++, so it's often necessary to integrate them. Matloff spends a lot of time describing how to enhance performance of R code, and some of the performance increases are dramatic.
There are also some gotcha's in R, like its weird scope hierarchy. I would have been interested to see how to handle namespace clashes, which is a big problem in Sage. I'd also have liked to see something about inline C++ code and Rcpp, which Hadley Wickham talks about in Advanced R. Wickham also has exercises, if you're looking for that, and says more about S3 vs S4 in a chapter called “OO field guide.” But Wickham's book is harder on the eyes, with all its code in light gray text, and it's not as well organized as this one.
But even if you don't plan to program, this book is valuable because of its clear descriptions of data frames, factors, and lists, which you'll need to get R to work. Matloff's style is clear and understandable.
feb 11, 2018