▸ statistics

Read: Super Crunchers - Why Thinking-by-Numbers Is the New Way to Be Smart

After having read Ian Ayres' Super Crunchers I feel more like listing what the book is not than what it is.

It is not about Super Crunchers. The title was chosen just for its appeal. Ayres explains that he experimented (using google ads) a little with different titles to find out what wold generate the most interest and consequently sales. Super Crunching is about data mining in huge data sets, Super Crunchers is more about raising the general awareness of the impact statistics can and should have on everyday decisions.

The book is not about the people that do the number crunching, it rather is a collection of anecdotal stories that point to the increasing possibilities data nowadays offer the decision maker.

Nor is the book a homage to statistical methods and theoretical research in statistics and econometrics. In his stories, Ayres sticks to the most simple statistics or jumps to something very far removed, neural networks. He presents some applications, some are first-hand (and never using huge data sets) others are only third-hand re-iterations. He adds a lot of personal details, politics and own business ventures. This makes the book kind of diverting to read. Yet I do not feel these diversions add to the supposed topic of Super Crunchers.

And finally, even though Ayres adds a little cautious note after a lot of praise what can be done with data and how we all surely will benefit from losing our informational self-determination, our privacy to the data mining industry and government, he falls short of any standard a lawyer should adhere to when it comes to privacy issues.

Having said all that, do not get me wrong: I enjoyed reading the book. Only afterwards I noticed all its different short-comings.

Read: Mostly Harmless Econometrics

Reading statistics or econometrics textbooks cover to cover is certainly not something any “normal” person would do. So, I am not normal. And so ain’t Mostly Harmless Econometrics by Angrist and Pischke.

You cannot learn econometrics just by reading this book, you would need another textbook for the basic econometric theory. Yet, MHE offers something often not found in your standard textbook: an applied perspective. It addresses issues that may arise from empirical work in labor and micro-economics focusing on identification of causal effects, illustrating the methods and pitfalls using empirical field studies that either rely on natural experiments (happenstance data) or field experiments.

Their brief chapter on nonstandard (i.e. nonstandard according to the theoretical ideal, the real world looks different) standard errors is, for instance, astonishingly accessible and almost makes me revise my standpoint on modelling the error structure (using multilevel designs) vs adjusting standard errors.

I do not know whether science geeks are still attracted by Adams' Hitchhiker’s Guide to the Galaxy. Angrist and Pischke, sure, are. Not only is the title of their textbook an obvious reference to Adams' work, they start every chapter with a little Adams quote. Something I did, too, when I was still in graduate school. This gives their book a slightly brighter, less earnest tone. All in all, it is certainly not as dry as many other econometrics textbooks.

As an additional added value, Angrist and Pischke set up a companion website to their companion where they post corrections (there are already quite a number of erratas) and comments to MHE.

Read: Picturing the Uncertain World

Not only out of professional necessity but also to satisfy my personal intellectual curiosity I follow the ongoing discussion on the visual display of quantitative information. Cleveland and Tufte are certainly the authors who influenced me the most when it comes to design a data display. So, of course, I ordered and read Howard Wainer’s Picturing the Uncertain World.

It is not quite what I expected. Though the consequences of uncertainty and the dangers of neglecting uncertainty are discussed, the book is not really focused on how to provide visual displays that capture and communicate the uncertainty in the data. Just one out of 21 chapter is explicitly addressing this topic. The other 20 chapters provide a wonderful narrative on the development of effective data displays and possible pitfalls. And this narrative is what makes the book worthwhile. Wainer provides an almost complete genesis of several (historical) examples of effective data displays. These little stories are both informative and entertaining. Consequently the book is not just about data displays, it is about the history of good data displays. A fact that is not conveyed by the book’s title, so that I was at first led to expect something slightly different.

Yet, I can wholeheartedly recommend this little practical guide. Wainer’s style is witty, entertaining, and instructive. The book is nicely typeset, a feature it shares with the works of Tufte. And finally, by providing a genesis of effective data displays the book certainly can teach more than by just providing examples of good and bad graphical illustrations. It shapes the way one might think about the data, and it reminds the reader that the same data can and has to be presented in different ways to address different specific problems.

Read: The Black Swan: The Impact of the Highly Improbable

The Black Swan has to be discussed on two different levels. The first is its topic, the impact of the highly improbable, our failure to recognize the importance of rare events, our belief in exact scientific predictions. The second is Nassim Nicholas Taleb’s rhetoric.

Taleb’s style is very entertaining. Unless your are an (financial) economist, statistician or social scientist. Taleb shows very little sympathy for researcher in these fields, up to the point where his rhetoric becomes almost insulting. His criticism is mostly justified. His language not. Thus it is no surprise that it is Taleb and not his work per se that is attacked by those who are affected.

Unfortunately, his rhetoric impedes the necessary impact on the profession. If you feel being under attack your are not likely to embrace the critical message.

Unfortunately for the profession, Taleb is right. His point being most dramatically proven just shortly after his book hit the shelves.

So, the remaining question is: how do we identify real world phenomena where we cannot rely on past experience? Where we do not have something like a random walk but rather have to expect to be confronted with an occasional random jump? That we live in world of many extremes is already nicely illustrated in the book. Yet, not everything is extreme and as Taleb explains himself where to expect extreme events is often rather hard to identify. So, it is no wonder that we are prone to what Taleb calls the ludic fallacy.

I appreciate that he does not offer a simple (and wrong) solution, that he does not try give an universal answer, that he just points us to a problem we should be aware of so that every once in a while we are not too painfully surprised.

Gelesen: Statistical Rules of Thumb

Gerald van Belle, Prof. für Biostatistik, legt mit Statistical Rules of Thumb einen praktischen Ratgeber für die statistische Praxis in bereits der zweiten Auflage vor. Das Buch ist thematisch gegliedert und jede Regel wird nicht nur motiviert sondern auch theoretisch hergeleitet. Van Belle hat einen sehr angenehmen Stil und trotz der eher trockenen Natur des Stoffes ist man verleitet, das kleine Büchlein von Anfang bis Ende zu lesen.

Das Buch ist nicht für den Anfänger, sondern wirklich für den Praktiker gedacht, dessen Handwerk sich hauptsächlich um die Anwendung der statistischen Methoden dreht. Rules of Thumb ist kein Lehrbuch und definitiv nicht die letzte Instanz zu theoretischen Fragen. Es bietet jedoch einen interessanten Überblick über die typischen Fragen, die sich bei der täglichen Anwendung von statistischen Methoden ergibt.

Leider hat das Lektorat an einigen Stellen versagt. Bedauerlich, wenn man bedenkt, dass dies bereits die zweite Auflage ist. Nicht selten sind Sätze unvollständig (Worte fehlen) oder übervollständig (Worte sind zuviel/doppelt). Das Formelwerk ist jedoch korrekt.

Für mich war bezeichnet, dass ich es hier schon wieder mit einem Biostatistiker zu tun habe. Ich als Ökonom mit einem (Ausbildungs-)Schwerpunkt in der Ökonometrie wende mich in meiner Arbeit immer häufiger den Methoden der Biostatistik zu. Nun gut, als Verhaltensökonom sind die sonst typischen Zeitreihenmodelle der Ökonometrie eher unbrauchbar, aber auch die Mikroökonometrie hinkt ein wenige der Biostatistik hinterher.

Gesichtet: guesstimation

Eins vielleicht gleich vorweg: Weinsteins und Adams Guesstimation weder löst noch versucht es die Probleme der Welt zu lösen. Der Untertitel des Buches ist wohl eher der Inspiration eines Marketingmitarbeiters des Verlages entsprungen.

Guesstimation ist ein kleines und auf seine besondere Weise unterhaltsames Buch, welches mittels sehr vieler Beispiele das Schätzen von Größenordnungen im alltäglichen Leben erklärt und damit vereinfacht. Statt eine unbekannte Größe aus dem Nichts heraus zu erraten, wird empfohlen über eine Kette logischer quantitativer Zusammenhänge zu einer wohlfundierten Vermutung zu gelangen. Die so geschätzten Werte liegen stets im Bereich der gleichen Zehnerpotenz des wahren Wertes. Eine Größenordnung, die für eine wohlfundierte Vermutung völlig ausreichend ist. Wer mehr als eine Überschlagszahl benötigt, muß sich eben genaueres Zahlenmaterial organisieren.

Unterhaltsam ist das Buch aufgrund der vielen informativen Beispiele – unterhaltsame Trivialitäten. Das Buch eignet sich jedoch nicht für ein lineares Lesen am Stück. In der Masse verlieren die Beispiele an Charme. Aber um z.B. auf einer Zugreise für ein wenig Kurzweil zu sorgen, eignet sich das kleine Büchlein hervorragend.

Gesichtet: Creating More Effective Graphs

Naomi Robbins liefert mit ihrem Buch Creating More Effective Graphs eine gelungene Übersicht über gute graphische Darstellungen von Zahlenmaterial. Hierbei bespricht sie jeweils auch kurz die üblichen Kardinalfehler und erläutert warum eine andere Darstellung besser ist. Didaktisch sinnvoll steigt der Komplexitätsgrad der Darstellungen nur langsam an.

Die Nähe zu Cleveland und Tufte ist deutlich und wird auch nicht verschwiegen. Bereits im Vorwort wird auf die beiden Größen der visuellen Datenaufbereitung verwiesen. Creating More Effective Graphs ist damit auch nicht als Ersatz, sondern als einführende Ergänzung zu den Standardwerken von Cleveland und Tufte zu verstehen.