A Primer for Computational Biology
Shawn T. O'Neil, Oregon State University
Copyright Year: 2017
Publisher: Oregon State University
Conditions of Use
This text does an excellent job covering the basics of Unix, Python, and R. It goes through each and explains all of the foundational approaches in an easily understandable manner. The online index is also quite effective and takes you to the... read more
This text does an excellent job covering the basics of Unix, Python, and R. It goes through each and explains all of the foundational approaches in an easily understandable manner. The online index is also quite effective and takes you to the appropriate page.
For the large part, the text does not contain errors. The primary errors that I noticed were due to out-of-date material. However, there was some information in the plotting chapter regarding making histograms in base R that I believe was incorrect/
The primary demerit here goes to the section on Python. This text uses Python 2.x, which is now deprecated. Python 3 is now standard. For the most part, I believe that this could be updated, but given that most Python libraries no longer are updated their Python 2 versions, this should already have been addressed. There may also be parts of the R and Unix sections that are out of date, but to my knowledge, no major additions or changes would need to be made.
This text does a good job of explaining things clearly to novice programmers, perhaps as good as any coding text I've read. However, especially in the Python segment, there was a lack of conceptual figures or labeled examples that might have been able to explain concepts better than any text could have. The jargon is minimal and tends to be explained better than many other coding texts. However, there often is often an overabundance of text. There were several times
There were no issues of consistency in this text.
In general, this book is quite modular. The only parts that depend on previous parts are necessarily so in order to properly learn all related techniques.
The Unix section is laid out in a very reasonable fashion, as well as the R section, for the most part. I found that the Python section was more haphazardly designed. For instance, it may have been useful to explain dictionaries earlier on, as they are a fairly integral part of the language.
At times, the inline code is hard to distinguish from the main text. It would be nice if some alternative coloration could be used.
No grammar issues noted.
This book contains no cultural references that I noticed.
I would say this book is useful as a supplement to online guides to programming. There were many instances where concepts are explained in more clarity and depth than one would normally find simply by Googling, but at the same time, it is easy to get lost in the details. One thing I would have liked would be to sprinkle in small exercises throughout each chapter, as a few of them have, rather than put them all at the end. Just having some of these would be nice to keep the reader engaged. Additionally, the Python section fails to suggest any powerful text editor, such as VS Code or Sublime Text, which would be vastly superior to the methods suggested. Perhaps that is simply because the text is outdated, but the text editors recommended in that section are very cumbersome.
This book covers the basics of operating within a Unix environment, whether local or remote, and programing in Python and R, focusing on basic programing techniques that form the basis of bioinformatics. It is an excellent reference for a... read more
This book covers the basics of operating within a Unix environment, whether local or remote, and programing in Python and R, focusing on basic programing techniques that form the basis of bioinformatics. It is an excellent reference for a geneticist who is interested in learning the core programing foundations, but for students who might not have specific questions in mind, it is hard to associate programing concepts to specific applications, as biological examples are sparse and poorly connected throughout the book. As such, this book would be excellent for situations where there is a singular learning objective, being learning programming languages, but would be a poor fit if there was a need to teach computational biology simultaneously.
The inclusion of Unix in the beginning was nice, as it is usually neglected until later chapters.
The overall content focuses on the application of Python and R using Unix environments in Bioinformatics settings. The textbook is accurate, however, programing conventions are dated to the early 2000s.
Part II of the book is written for Python 2 which was sunset in January 2020 with the retirement known since 2006. Conventions and discussion should be updated to Python 3. This will help students who have exposure to other languages, such as Java, adopt to Python, and enable those who spend time with this book to more easily read other languages.
The examples used in Part II and Part III are disjoint. This makes it difficult to understand how Python and R integrate together and why they are placed in the same textbook.
There was an exclusion of some common resources used by folks beginning their journey in computational biology, failing to describe what NCBI acts as and its role, or how Galaxy has been adopted. The standardization of workflows on these services is an important piece of understanding how programing is done, replicated, and repeated across groups.
Determining the relationship between the programing languages and computational biology is left as an exercise for the reader.
The programing language is consistent in structure and form, and the language is fairly colloquial. However, programing concepts are not covered equally in Python and in R. For example, mutability is only discussed in connection with Python, while it is essential to the operation of R. Similarly, IDEs are only discussed in context of R with R studio, while in Python there is discussion of Jupyter, which lacks the essential feature of an IDE, debugging.
This could be broken down easily into sections for a intro programming class. The individual sections are for the most part independent, especially when it comes to programing concepts.
For the most part, Unix, Python, and R were presented in the standard order. The large windows and general lack of computation biology concepts and theory would make it difficult to use effectively for students who are not well versed in computational biology or for those who are completely new to programming and have a base understanding of genetics and molecular biology.
Book is not accessible to those who use screen readers. None of the images have captions and all of the code is embedded in image. Similarly the index is not accessible to those who want to print out the book from available PDFs.
Code is not in modern grammar/syntax, especially with Python which has adopted more Java syntax. There are small, common convention "errors" in programing discussion, for example using "=" versus "<-" is not equivalent in R, and in Python it is conventional to use a[-1] instead of a[len(a)-1].
The audience for this book is not clear. It does not seem like a book for those who have experience with programing, it is too detailed on that side, while it also is not a book for a budding molecular biologist, as it doesn't explore the key concepts of how computational biologist use programing besides going into sequence alignment, which is just one area, not fulfilling the title "Primer on Computational Biology".
Table of Contents
- Part I: Introduction to Unix/Linux
- Part II: Programming in Python
- Part III: Programming in R
About the Book
A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:
- Introduction to Unix/Linux: The command-line is the “natural environment” of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful “pipe” operator for file and data manipulation.
- Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs.
- Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.
About the Contributors
Shawn T. O'Neil, Oregon State University