Skip to main content

Data analysis: Statistical learning and visualization (FMSF90F)

16 January 2023 - 17 March 2023, 7.5 ECTS

– Published 16 January 2023

This is a PhD-course in applied statistical learning, ie. about using statistical techniques, such as modelling and prediction, to analyse real datasets, and making correct interpretations  and conclusions.

The course begins with an overview of basic data wrangling and visualisation, with a focus on the student's ability to identify and illustrate important features of the data.  Then important methods in statistical learning are introduced. Emphasis is given to dimension reduction, supervised and unsupervised learning. Issues arising from fitting multiple models (i.e. multiple testing) as well as the methods’ relationship to regression are discussed. Computer based labs and projects form an imporant part of the learning activities.

The course concludes with a project where the students should select and apply suitable methods on a real data set, and present an analysis of the data. We encourage PhD-students submitting and analysing relevant data from their own research, but for students without data of their own we will match with other students or provide data for the final project.

More information and access to the course plan is available on the LTH website.

Course content

  • Basic methods for data handling and common visualisation methods for data 
  • Methods for data reduction such as Principal Component Analysis (PCA) and their use for imputation of missing data.
  • Methods for unsupervised and supervised learning/classification such as: Support Vector Machines (SVM), clustering (K-means), hierarchical clustering, simpler regression methods, and methods for
    decision trees (bagging, boosting, and random forests). 
  • Multiple testing and common solutions such as Benjamini-Hochberg and Bonferroni.

Remark: The course does not cover neural networks or deep learning.

Prerequisites

  • Basic statistics course
  • Some programming experience
  • Access to a laptop computer with the ability to install R and R-studio

Schedule

The course will start on the 17th January 2023 and is planned to continue until 16th March 2023.

Teaching times:

  • Tuesdays 13-15 (lectures)
  • Thursdays 13-15 (lectures) + 15-17 (lab sessions)

Examination

Examination is through projects (including peer review), three for specific methods presented in the course and one final project using components from the entire course. The final project includes an oral presentation to the class.

Textbook

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani; An Introduction to Statistical Learning with Applications in R, 2ed. Springer, 2014, ISBN:978-1-0716-1417-4.

Freely available as e-book: (ISBN: 978-1-0716-1417-1)

Teacher

Linda Hartman (Mathematical Statistics)

Registration

Plases are limited, early registration is recommended.  Registration deadline is the 16th December 2022.

Please use the registration page to register for the course.