Statistics

about.me Follow me on Facebook Follow me on Goodreads Follow me on Twitter

Read: Numbers Rule Your World

Numbers certainly rule my world. Statistics is one of my most important research tools. So why not having a look at another pop-statistics book?

Kaiser Fung’s Numbers Rule Your World manages to introduce some fundamental concepts of statistics without actually doing any statistics. Each concept is introduced in the context of two case studies that highlight different aspects of the concept. Showing that different objectives lead to a different approach to the data, utilizing the same statistical concept in different ways. Further, instead on giving another introduction to the standard basics like central tendency, random processes and so on Fung focuses on a few central, more general concepts. Thus, he intends rather to instill some statistical thinking than to instruct the reader in specific methods.

Let me single out two statistical concepts from his list of five that I believe to be the most important.

First, heterogeneity matters (Fung writes ‘variability matters’), the mean hides all the interesting stuff. Much of my research interest can be summarized with this exact same statement. There is no real representative agent in economics. Differences in preferences and behavior, in the intensity of reactions to a stimulus (or treatment in an experiment) are the really interesting part of the story. Studying human behavior without this heterogeneity would be rather boring.

Second, there are two types of errors in statistical tests – false negatives and false positives – and we attach different costs to these errors. There is always a trade-off. You cannot decrease the error rate of one without increasing the other error (keeping the data constant and just moving the decision threshold around). Indeed asymmetric costs can also be attached to errors in a continuous model. And these asymmetric costs often exit in real life, a negative deviation from a target may have a considerably different impact on a decision than a positive deviation.

Both ideas seem often neglected in applied work. It is not the differences between individuals that is studied, it is often the general tendency of the whole group that is reported. Between-subject heterogeneity is hidden. And, often only one type of error is explicitly mentioned. Mr. P tells you only about the false positive. (Too often, I have to plead guilty on the second charge, too.)

In the end, however, Fung’s book is not for the (applied) statistician or the seasoned researcher applying statistics in his work. It is for lay-people and it may be well worth their time to have a look at it.

Tags: 

Read: The Cult of Statistical Significance

I think my first “contact” with Deirdre McCloskey was when I got seriously interested in scientific writing and in particular in how to improve my writing. I read her Economical Writing at about the same time as Strunk & White’s The Element of Style. That must have been during the middle or shortly before finishing my PhD. Yes, that late. The Rhetoric of Economics followed very soon. Here I got a first glimpse at her battle against the evil p-value and the misuse of statistics. I have to admit even though I agree with her main critique I do not follow all her advice — I think that is one of the big problems she sees in empirical economists. They agree but still do otherwise. I also had the good luck to meet Gerd Gigerenzer, a psychologist and fellow warrior against the misuse of statistics, and discuss this particular topic with him during a sociable evening after a long day full of presentations at a remote conference venue of the Max Planck Society. Yes, there is something wrong with our (that is the economist’s) way of relying on, reporting, and interpreting statistics and specifically statistical significance.

How the Standard Error Costs Us Jobs, Justice, and Lives is not only the subtitle of Ziliak and McCloskey’s manifesto The Cult of Statistical Significance it is quite indicative of their (strong) rhetoric.

The book can be roughly divided in two parts that are devoted to different “themes”. The first is an updated and extended rehash of their earlier articles on the current practice of relying on statistical significance in various fields. If you have not read their articles so far read this and be shocked. You will see the author’s outrage in every paragraph. The second part and theme is a historical account that tries to shed light on how we ended up where we are. This part is rather filled with bitterness and repugnance for R. A. Fisher and compassion for the neglected Mr. Student, William Sealy Gosset.

Ziliak and McCloskey’s rhetoric is unique, yet it is not always to their benefit. Though, they certainly make their point and at least in private you have to agree with them. All in all, the book is entertaining and instructive. Even so, I would not assign this book to a class for reading I would rather recommend the 2004 special issue of the Journal of Socio-Economics on this topic. On the other hand, every empirical scientist and every policy maker relying on scientific research (shouldn’t they all?) should be aware that, first, size matters and that precision of measurement should not be the only decision criteria.

Read: Guide to Information Graphics

Now, that was a waste of money. Don’t get me wrong. Dona Wong’s Guide to Information Graphics is a nicely designed little book with some valuable advice on how to present quantitative date. Why is it a waste of money? It does not go beyond very small data sets and few closely related time series. The data we talk about is so sparse that even the dreaded pie chart cannot distort the perception of the depicted quantities by much and consequently is discussed in this little book.

Though, book may be an overstatement; booklet seems more appropriate. And despite only being about 150 pages ‘thick’ there are some repetitions in its content. This is often a good didactic move. For a reference book not so much.

Since Dona Wong is a student of Edward Tufte it makes sense to rather refer to his work. So instead of looking into Guide to Information Graphics have a look at:

Another “Old Master” is William S. Cleveland and his

If you rather need an overview of different types of plots and ways to present data Information Graphics - A Comprehensive Illustrated Reference by Robert L. Harris is the reference you look for.

Not as nicely designed as Dona Wong’s Guide, yet with considerable more content is Naomi Robbins’ Creating More Effective Graphs.

And finally, I rather enjoyed reading Howard Wainer’s Picturing the Uncertain World. Though it is more a historic account of the development of good and effective graphical displays.

Read: The Drunkard's Walk - How Randomness Rules our Lives

While Ayre’s Super Cruncher invited to find patterns in seemingly random data (and running controlled experiments to assess differences in treatments, e. g. maximizing sales revenue by the “right” choice of book title) Mlodinow’s The Drunkard’s Walk is more a warning of seeing patterns in seemingly non-random data.

Life is full of randomness and Mlodinow’s little book raises some awareness to the random factor in our lives. He gives a nice historical account of the concept of randomness in mathematics and other sciences as the disciplinary borders were once not as distinct as they are now. This reminded me a bit of Peter Bernstein’s Against the Gods. Though Mlodinow’s work is considerably shorter and more focused owing to the more directed topic of his book. My recollection may be wrong, yet I believe his work is also more sanguine.

In short, he did a very good job. The Drunkard’s Walk is entertaining, balanced and instructive and covers considerably more than just the economic side of randomness: the chance element in our lives, luck and misfortune, the misperception of probabilities and causality, and psychological biases. Finally, he also cautions all those who rely a little too much on their statistics…

Read: Super Crunchers - Why Thinking-by-Numbers Is the New Way to Be Smart

After having read Ian Ayres’ Super Crunchers I feel more like listing what the book is not than what it is.

It is not about Super Crunchers. The title was chosen just for its appeal. Ayres explains that he experimented (using google ads) a little with different titles to find out what wold generate the most interest and consequently sales. Super Crunching is about data mining in huge data sets, Super Crunchers is more about raising the general awareness of the impact statistics can and should have on everyday decisions.

The book is not about the people that do the number crunching, it rather is a collection of anecdotal stories that point to the increasing possibilities data nowadays offer the decision maker.

Nor is the book a homage to statistical methods and theoretical research in statistics and econometrics. In his stories, Ayres sticks to the most simple statistics or jumps to something very far removed, neural networks. He presents some applications, some are first-hand (and never using huge data sets) others are only third-hand re-iterations. He adds a lot of personal details, politics and own business ventures. This makes the book kind of diverting to read. Yet I do not feel these diversions add to the supposed topic of Super Crunchers.

And finally, even though Ayres adds a little cautious note after a lot of praise what can be done with data and how we all surely will benefit from losing our informational self-determination, our privacy to the data mining industry and government, he falls short of any standard a lawyer should adhere to when it comes to privacy issues.

Having said all that, do not get me wrong: I enjoyed reading the book. Only afterwards I noticed all its different short-comings.

Read: Mostly Harmless Econometrics

Reading statistics or econometrics textbooks cover to cover is certainly not something any “normal” person would do. So, I am not normal. And so ain’t Mostly Harmless Econometrics by Angrist and Pischke.

You cannot learn econometrics just by reading this book, you would need another textbook for the basic econometric theory. Yet, MHE offers something often not found in your standard textbook: an applied perspective. It addresses issues that may arise from empirical work in labor and micro-economics focusing on identification of causal effects, illustrating the methods and pitfalls using empirical field studies that either rely on natural experiments (happenstance data) or field experiments.

Their brief chapter on nonstandard (i. e. nonstandard according to the theoretical ideal, the real world looks different) standard errors is, for instance, astonishingly accessible and almost makes me revise my standpoint on modelling the error structure (using multilevel designs) vs adjusting standard errors.

I do not know whether science geeks are still attracted by Adams’ Hitchhiker’s Guide to the Galaxy. Angrist and Pischke, sure, are. Not only is the title of their textbook an obvious reference to Adams’ work, they start every chapter with a little Adams quote. Something I did, too, when I was still in graduate school. This gives their book a slightly brighter, less earnest tone. All in all, it is certainly not as dry as many other econometrics textbooks.

As an additional added value, Angrist and Pischke set up a companion website to their companion where they post corrections (there are already quite a number of erratas) and comments to MHE.

Pages