# Computational Methods in Evolutionary Biology, WS 2014/2015

Lecture with Exercises (Vorlesung mit Übung):

**Instructor:** Prof. Dr. Dirk Metzler

**Time:**

Lecture: Each Wednesday and Friday from 8:45 to 10:15
in room C00.013

Exercises: Each Wednesday from 10:15 to 11:00 and each Friday from 10:15 to
12:00 in the computer room C00.005

Additional exercises and lecture for block courses Phylogenetics I/II and
Comp. Pop. Gen. I/II: Tuesday 9:00 to 12:00 and Wednesday from 11:00 to 12:00.

**Exam: ** Will be on 6. February at 10:00 a.m. in
room D00.013, both for bioinformatics students and
Computational Population Genetics II exam for students who take this course block-wise (as EES
students, for example). Room to be announced. Please note that you can bring an A4 formula sheet and a pocket
calculator. Then A4 formula sheet must only contain your original
handwriting! It can be two-sided.

**Target group / Zielgruppe:**
Master's and PhD students in EES/MEME, Bioinformatics, Biostatistics, Biology, Mathematics, Statistics,...

Students in block-structured programms like **EES**, **MEME** and the
Master's program in **Biology** can participate block-wise:

Phylogenetics I (Block I, Oct 6 - Oct 24)

Phylogenetics II (Block II, Oct 27 - Nov 14)

Computational
Methods in Population Genetics I
(Block III, Nov 17 - Dec 12)

Note: The Population Genetics I course starts on Wednesday, 19th November.

Computational
Methods in Population Genetics II
(Block IV, Dec 12 - Jan 30)

### Contents

Data sets of DNA, RNA or protein sequences contain a lot of hidden informations
about the history of evolution, about evolutionary processes and about the roles of
particular genes in evolutionary adaptation. It is a challenge to develop
methods to uncover these informations. Methods that are based on explicit
models for evolutionary processes and on the application of statistical principles
(like likelihood-maximization or Bayesian inferrence) are most promising. Some of these methods,
however, can be very demanding - computationally and intellectually. A
thorough understanding of the models and methods is crucial, not only for those
who aim to contribute to the further development of such methods but also for those who
want to apply these methods to their datasets and have to decide which method
to choose, how to set their optional parameters and how to interprete the outcome.

In the first half of the semester we will focus on computational methods in **phylogenetics** In the second
half of the semester we will turn to **population genetics**.
##### Bayesian and likelihood-based Phylogenetics

We discuss methods from computational statistics and their applications in
phylogenetic tree reconstruction. First we compare maximum-likelihood (ML)
methods to parsimonious and distance-based methods. Then we turn to Bayesian
methods that are based on Markov-Chain Monte-Carlo (MCMC) approaches like the
Metropolis-Hastings algorithm and Gibbs sampling. Such methods allow to sample
phylogenies (approximately) according to their posterior probability,
i.e. conditioned on the given sequence data. Thus, it is also possible to
assess the uncertainty of the estimation. Among the special applications that
we discuss are phylogeny estimation with time-calibration (e.g. according to
the fossil record) and methods for the reconcilement of gene trees and species trees.
Statistical methods are always based on probabilistic models for the origin of
the data. Therefore, we discuss evolution models for biological sequences
(Jukes-Cantor, PAM, F81, HKY, F84, GTR, Gamma-distributed rates,....) and the
fundamentals about Markov processes that are necessary to understand these
models. Furthermore, we will discuss relaxed molecular-clock models and
Brownian-motion models for the evolution of quantitative traits along
phylogenetic trees. Another topic are statical sequence-alignment methods that
are based on explicit sequence evolution models with insertions and deletions
(TKF91, TKF92,...).
Software:
PHYLIP,
Seq-Gen,
R
with the ape package,
RAxML,
MrBayes,
BEAST,
Bali-Phy,
....
##### Computational methods in population genetics

Given population genetic data, how can we infer evolutionary and
ecological features like population substructure, change of population
size, recent speciation, natural selection and adaptation? Many
computational methods for this purpose have been proposed and most of
them are freely available in software packages. In this course we will
discuss the theoretical and practical aspects of these methods. The
theoretical aspects are the underlying models, statistical principles
and computational strategies. In the practical part we will analyze these
methods. We will also try out
various software packages and explore under which circumstances they
are appropriate. Among the models that we discuss are the coalescent
process and its variants with structure and demography, the ancestral
selection graph, and the ancestral recombination graph. Among the
parameter estimation strategies are full-likelihood and full-Bayesian
methods, methods based on summary statistics, and Approximate-Bayesian
Computation. These methods use computational strategies like
importance sampling and variants of MCMC. Software:
LAMARC,
GENETREE,
Hudson's MS,
IM/IMa,
MIMAR,
STRUCTURE,
etc...
**Language:** English

##### Handouts

The following handouts contain only a summary of the contents of the slides shown in
the lecture. More detailed explanations are given on the whiteboard during the
lectures. The handouts will be updated during the
semester.

Handout on Phylogenetics: PhyloHandout.pdf,
handout on Computational Population
Genetics: CMPG_handout.pdf

##### Exercises on Phylogenetics

phylo01.pdf

phylo02.pdf

phylo03.pdf

phylo04.pdf
, PAM_rate_matrix.txt, pfold_rate_matrix.txt

phylo05.pdf, QuantTraitsA.csv, QuantTraitsB.csv, QuantTraitsC.csv, QuantTraits_Tree.txt

##### Phylogenetics example files

primates.nex, primates.phylip, primates.R, NJvsMPvsML.zip
##### Exercises on Computational Population Genetics

sheet01.pdf

sheet02.pdf

sheet03.pdf

sheet04.pdf

sheet05.pdf

##### Comp PopGen software example files

Tajimas_D.R, abc.R

SortSequences.R (Example R file to convert ms/seq-gen output to Migrate input file, which can be read by Lamarac
input file converter)

Announcement for bioinformaticians in official LMU course overview: Lecture, Exercises

web page last updated: Dirk Metzler, Feb 4, 2015