Think Stats: Probability and Statistics for Programmers - 2e
Allen B. Downey, Franklin W. Olin College of Engineering
Copyright Year: 2014
ISBN 13: 9781491907337
Publisher: Green Tea Press
Language: English
Formats Available
Conditions of Use
Attribution-NonCommercial
CC BY-NC
Reviews
Professor Downey is an expert writer with over 12 books under his belt. This particular book is very comprehensive. The author guides an engineer with minimal statistical knowledge into the intrinsicness of statistics. Professor Downey started... read more
Professor Downey is an expert writer with over 12 books under his belt. This particular book is very comprehensive. The author guides an engineer with minimal statistical knowledge into the intrinsicness of statistics. Professor Downey started the book with basic concepts of exploratory data to distribution, plotting and effect size, moves to probability mass functions, and cumulative distribution. Then he untangles the complicated subject of modeling distributions and probability density functions. From Chapter 7 he starts a journey to hypothesis testing and regression analysis. The concepts of hypothesis testing and regression analysis are not simple, so he begins by explaining the relationship between variables, demonstrating the relationships with scatter plots. He moves to explaining concepts like correlation, covariance and linear dependency (Pearson correlation coefficient). From this chapter, he moves to explaining sample distributions and sampling bias. By now the student has a strong understanding of sample distribution and ready to learn about hypothesis testing. During the chapter in hypothesis testing, he describes the most common methods to perform hypothesis testing to compare two different groups. In chapter 10, the author explains basic concepts necessary to understand regression like least square, residuals, goodness of fit and weighted resampling. Then, in chapter 11 he describes multiple regression analysis, nonlinear relationships and logistic regression. Finally, he explains the more advanced subjects like time series and survival analysis.
Professor Downey is a senior engineer and a data scientist. Consequently, accuracy is part of his training, background, and career. This book is highly accurate.
In the age of big data, this book is relevant and essential for any engineer that wants to move to the are of big data. The longevity of the book is unknown, the area is moving very fast, but he is teaching basic concepts, so I expect that the book will be relevant for at least a decade.
IMHO, the book is very clear for anybody with some background in computer science and programming. On the other hand, for somebody without any knowledge of Python or programming, it could be hard. The author explains in the preface that some experience in programming will be necessary to understand the book. The author has some other open text books like "Think Python" that should be read before reading this book.
Consistency is to the extreme. Every chapter starts with an introduction, explanations of methods, examples, and description of the code used to demonstrate the concepts or to generate the graphics. Also, the author provides code, exercises, and a glossary for every chapter.
The book is modular in the sense that we can read sections that we are not familiar and skip parts that we are not familiar. Every chapter has multiple sections with subheaders just to provide an example chapter 10 has seven different sub-sections plus the exercises and glossary. However, skipping sections or dividing parts among the various students could be confusing because the flow of the book requires understanding essential concepts before moving to more complex chapters. I don't penalize the author for going from simple concepts to more complex, so I will consider the book modular since each chapter has sub-sections.
As described in previous questions 1 and 6, Professor Downey developed a logical structure where one concept is described, learned and consolidated with the exercises before moving to more complex sections. In other words, a structure that goes from basic to complex concepts. I understand that every student is different, and each learns in a different way, but I predict that for many students the logical flow of the book will be an enjoyable experience.
I was unable to find any interface errors.
I was unable to find any grammatical errors.
The book is culturally neutral. However, Professor Downey teaches statistics with Python while the majority of the biostatisticians use R, and many of them will frown upon the use of Python to teach statistics.
I will definitively recommend this book but recommend to read his "Think Python" book before or at least take a refresh Python course before reading this book.
Table of Contents
- Preface
- 1 Exploratory data analysis
- 2 Distributions
- 3 Probability mass functions
- 4 Cumulative distribution functions
- 5 Modeling distributions
- 6 Probability density functions
- 7 Relationships between variables
- 8 Estimation
- 9 Hypothesis testing
- 10 Linear least squares
- 11 Regression
- 12 Time series analysis
- 13 Survival analysis
- 14 Analytic methods
Ancillary Material
About the Book
Think Stats is an introduction to Probability and Statistics for Python programmers.
- Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets.
- If you have basic skills in Python, you can use them to learn concepts in probability and statistics. Think Stats is based on a Python library for probability distributions (PMFs and CDFs). Many of the exercises use short programs to run experiments and help readers develop understanding.
About the Contributors
Author
Allen B. Downey is an American computer scientist, Professor of Computer Science at the Franklin W. Olin College of Engineering and writer of free textbooks.
Downey received in 1989 his BS and in 1990 his MA, both in Civil Engineering from the Massachusetts Institute of Technology, and his PhD in Computer Science from the University of California at Berkeley in 1997.
He started his career as Research Fellow in the San Diego Supercomputer Center in 1995. In 1997 he became Assistant Professor of Computer Science at Colby College, and in 2000 at Wellesley College. He was Research Fellow at Boston University in 2002 and Professor of Computer Science at the Franklin W. Olin College of Engineering since 2003. In 2009-2010 he was also Visiting Scientist at Google Inc.