Syllabus
Course Meeting Times
Lectures: 2 sessions / week, 1.5 hours / session
Recitations: 1 session / week, 1 hour / session
Course Description
This course focuses on the algorithmic and machine learning foundations of computational biology, combining theory with practice. We study the principles of algorithm design for biological datasets, and analyze influential problems and techniques. We use these to analyze real datasets from large-scale studies in genomics and proteomics. The topics covered include:
- Genomes: biological sequence analysis, hidden Markov models, gene finding, RNA folding, sequence alignment, genome assembly
- Networks: gene expression analysis, regulatory motifs, graph algorithms, scale-free networks, network motifs, network evolution
- Evolution: comparative genomics, phylogenetics, genome duplication, genome rearrangements, evolutionary theory, rapid evolution
Prerequisites
6.006 Introduction to Algorithms, 7.01 Introductory Biology, 6.041 Probabilistic Systems Analysis
Textbooks
This course will use the following three textbooks:
Durbin, Richard, Sean Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press, 1999. ISBN: 9780521629713.
Jones, Neil, and Pavel Pevzner. An Introduction to Bioinformatics Algorithms. Cambridge, MA: MIT Press, 2004. ISBN: 9780262101066.
Duda, Richard, Peter Hart, and David Stork. Pattern Classification. New York, NY: Wiley-Interscience, 2000. ISBN: 9780471056690.
Grading
Your grade in this course will be based on the following:
ACTIVITIES | PERCENTAGES |
---|---|
Problem sets | 40% |
Midterm exam | 20% |
Final project | 25% |
Scribing | 10% |
Participation | 5% |
Problem Sets
There will be four problem sets during the first half of the semester. Each problem set will include 3-5 problems for all students and one problem for graduate students only. The problem sets will include both theoretical and programming problems. For programming problems, we will provide skeleton code in Python, but you may use a different programming language if you so choose.
Midterm Exam
There will be a midterm exam approximately halfway through the course, which will cover all material up until that point. There will be no final exam.
Final Project
You will complete a final project during the second half of the semester. You may either work alone or with one partner. Teams and graduate students will be expected to undertake more ambitious projects. In previous years, approaches to the final project have included:
- Compare several computational biology algorithms for solving the same problem, by implementing them, applying them to some dataset, and evaluating the results.
- Design and apply a novel computational biology algorithm and evaluate its performance and effectiveness.
- Carefully analyze, with criticism, corrections and/or improvements, a relevant conference or journal article.
We will distribute more detailed project expectations and suggested project topics as the term progresses.
Scribing
Each student will be required to scribe for one lecture. Several students may be assigned to work together on each lecture, depending on course enrollment. You are encouraged but not required to use LaTeX for scribe notes.
As a scribe, you should strive to produce a self-contained narrative of the lecture. The slides for each lecture will be available, so you should pay particular attention to issues that the slides don't convey well on their own. For example: what is the background and motivation for the problem we are studying? What were some particularly insightful questions and answers that we discussed? Were there any common misunderstandings or points of confusion? How about alternative ways of explaining a concept or algorithm? Did we stumble upon any good ideas for a final project?
You can of course use the scribe notes from previous years and improve upon them, and the LaTeX source will be made available to the students scribing each lecture by the TAs.
Recitations
A weekly recitation will be held on Fridays, during which we will discuss additional aspects of the lecture material and hold Q&A.