Computational Methods in Evolutionary Biology, WS 2024/2025

Direct links to teaching material: Lecture with Exercises (Vorlesung mit Übung):

Language: English

Instructors:
Dirk Metzler (lectures and exercises),
Andrea Pozzi (practical exercises phylogenetics),
Sara Castillo Vicente (practical exercises on population genetics)

Moodle Forum:

Please make sure that you are enrolled in the moodle site of the course as important information, e.g. on exam dates, will be communicated over moodle.

Time:

Lectures: Wednesdays and Fridays, 9:00 a.m. to 10:30 a.m.
Exercise sessions: Wednesdays and Fridays, 10:45 a.m. to 12:00
For participants of block courses Phylogenetics and Comp. Pop. Gen.: Practical exercise sessions Tuesdays 9:00 to 10:30 (these times might slightly change)
Online option: If there is demand for hybrid teaching, there may be the option for online participation.

Target group (Zielgruppe) and ECTS points:

Master's and PhD students in EES/MEME, Bioinformatics, Biostatistics, Biology, Mathematics, Statistics,...
Please find the module description for this course in the official module catalog of the bioinformatics master's program.
Students can obtain 9 ECTS for passing the exam of the entire course Computational Methods in Evolutionary Biology.

Students in block-structured programms like EES, MEME and other Biology Master's program can participate block-wisely, that is take only the Phylogenetics Block or only the Computational Methods in Evolutionary Biology Block or both.
For each block, combined with the additiona practical part, students can obtain 6 ECTS.

Phylogenetics (15. Oct. 2024 - 4. Dec. 2024)
Computational Methods in Population Genetics (4. Dec. 2024 - 7. Feb. 2025)

Contents

Data sets of DNA, RNA or protein sequences contain a lot of hidden informations about the history of evolution, about evolutionary processes and about the roles of particular genes in evolutionary adaptation. It is a challenge to develop methods to uncover these informations. Methods that are based on explicit models for evolutionary processes and on the application of statistical principles (like likelihood-maximization or Bayesian inferrence) are most promising. Some of these methods, however, can be very demanding - computationally and intellectually. A thorough understanding of the models and methods is crucial, not only for those who aim to contribute to the further development of such methods but also for those who want to apply these methods to their datasets and have to decide which method to choose, how to set their optional parameters and how to interprete the outcome.
In the first half of the semester we will focus on computational methods in phylogenetics In the second half of the semester we will turn to population genetics.

Bayesian and likelihood-based Phylogenetics

We discuss methods from computational statistics and their applications in phylogenetic tree reconstruction. First we compare maximum-likelihood (ML) methods to parsimonious and distance-based methods. Then we turn to Bayesian methods that are based on Markov-Chain Monte-Carlo (MCMC) approaches like the Metropolis-Hastings algorithm and Gibbs sampling. Such methods allow to sample phylogenies (approximately) according to their posterior probability, i.e. conditioned on the given sequence data. Thus, it is also possible to assess the uncertainty of the estimation. Among the special applications that we discuss are phylogeny estimation with time-calibration (e.g. according to the fossil record) and methods for the reconcilement of gene trees and species trees. Statistical methods are always based on probabilistic models for the origin of the data. Therefore, we discuss evolution models for biological sequences (Jukes-Cantor, PAM, F81, HKY, F84, GTR, Gamma-distributed rates,....) and the fundamentals about Markov processes that are necessary to understand these models. Furthermore, we will discuss relaxed molecular-clock models and Brownian-motion models for the evolution of quantitative traits along phylogenetic trees. Another topic are statical sequence-alignment methods that are based on explicit sequence evolution models with insertions and deletions (TKF91, TKF92,...). Software: PHYLIP, Seq-Gen, R with the ape package, RAxML, MrBayes, BEAST, Bali-Phy, ....

Computational methods in population genetics

Given population genetic data, how can we infer evolutionary and ecological features like population substructure, change of population size, recent speciation, natural selection and adaptation? Many computational methods for this purpose have been proposed and most of them are freely available in software packages. In this course we will discuss the theoretical and practical aspects of these methods. The theoretical aspects are the underlying models, statistical principles and computational strategies. In the practical part we will analyze these methods. We will also try out various software packages and explore under which circumstances they are appropriate. Among the models that we discuss are the coalescent process and its variants with structure and demography, the ancestral selection graph, and the ancestral recombination graph. Among the parameter estimation strategies are full-likelihood and full-Bayesian methods, methods based on summary statistics, and Approximate-Bayesian Computation. These methods use computational strategies like importance sampling and variants of MCMC. Software: LAMARC Hudson's MS, IM/IMa, Beast2, STRUCTURE, etc...

Handouts

The following handouts contain only a summary of the contents of the slides shown in the lecture. More detailed explanations are given on the whiteboard during the lectures. The handouts will be updated during the semester. We may not have time for all the topics that appear in the handout but also add newer topics.
Handout on Phylogenetics: PhyloHandout.pdf, handout on Computational Population Genetics: CMPG_handout.pdf

Exercises from last year (might be updated during the semester)

Exercises on Phylogenetics

phylo01.pdf
phylo02.pdf
phylo03.pdf
phylo04.pdf
phylo05.pdf
phylo06.pdf
phylo07.pdf, QuantTraitsA.csv, QuantTraitsB.csv, QuantTraitsC.csv, QuantTraits_Tree.txt
(exercise sheets might be updated during the semester)

Phylogenetics example files

primates.nex, primates.phylip, primates.R, NJvsMPvsML.zip

Exercises on Computational Population Genetics

sheet01.pdf
sheet02.pdf
sheet03.pdf
sheet04.pdf
sheet05.pdf, cheater.txt, cpg_islands.txt.zip
sheet06.pdf
sheet07.pdf
(to be added/updated during the semester)

Comp PopGen software example files

drift.R (Wright-Fisher and finite-population coalescent simulations), Tajimas_D.R, abc_example.R, coala_jsfs.R
SortSequences.R (Example R file to convert ms/seq-gen output to Migrate input file, which can be read by Lamarac input file converter)

Linux

In the practical part of the cours(es) we will use Linux. If you are new to Linux/Unix, you may be interested in some online tutorials such as http://www.ee.surrey.ac.uk/Teaching/Unix/ or https://www.codecademy.com/learn/learn-the-command-line. It may be a good idea to go through one of these tutorials even before the course starts.

Exams

will be announced in the moodle forum of the course.

Announcement for bioinformaticians in official LMU course overview


web page last updated: Dirk Metzler, 30. Aug. 2024