DM856: Unsupervised biomedical data analysis (5 ECTS)

STADS: 15019101

Level
Master's level course

Teaching period
The course is offered in the spring semester.

Teacher responsible
Email: roettger@imada.sdu.dk

Timetable
Group Type Day Time Classroom Weeks Comment
Common I Wednesday 10-12 IMADA semi 5-11
Common I Thursday 08-10 IMADA semi 5-11
H1 TE Tuesday 08-10 IMADA semi 6-11
Show entire timetable
Show personal time table for this course.

Comment:
Ubegrænset deltagerantal. Fælles undervisning med DM843.

Prerequisites:
None

Academic preconditions:
The course is co-taught with DM843. The course cannot be chosen by students, who has either followed, or has passed DM843.

Course introduction
One trend can be observed over almost all fields of informatics: we have to cope with an ever increasing amount of available data of all kinds. This amount of data renders it impossible to inspect the dataset "by hand", or even deduce knowledge from the given data, without sophisticated computer aided help.

The purpose of the course is to enable the student to apply and understand common unsupervised learning tasks, e.g., clustering to unknown datasets. The students will be able to interpret the results and unravel hidden structures in the datasets. The application focus will be on biomedical datasets but is generally relevant for a multitude of other fields coping with large datasets.

The course builds on the knowledge acquired in the courses "Introduction to Programming", “Introduction to bioinformatics” and gives competences for master thesis work in the area.

In relation to the learning outcomes of the degree the course has explicit focus on:
  • General experimental design in the context of statistical and computational data analysis.
  • Detailed planning of experiments for subsequent interpretation by computational approaches.
  • Interpretation of experimental data using computational methods.
  • Choose among scientific theories, methods, tools and general properties within computational biomedicine and bioinformatics and apply these to the investigation of scientific questions.
  • challenges the student with real-life datasets and problem solving skills.


Expected learning outcome
The learning objectives of the course is that the student demonstrates the ability to:
  • Describe clustering algorithms and models covered in this course.
  • Describe proximity measures covered in this course and judge conditions when they should or should not be applied.
  • Describe cluster validity indices covered in this course and judge conditions when they should or should not be used.
  • Formulate the above in a precise language and notation.
  • Implement clustering algorithms, pre-processing steps, proximity measures and cluster validity indices covered in this course.
  • Perform an entire cluster analysis based on these implementations.
  • Judge the quality of an entire clustering pipeline, from data preprocessing, through the selection of an appropriate proximity measure, to the evaluation of the results.
  • Describe the implementation and experimental work in a scientific and precise fashion.
Subject overview
The following main topics are contained in the course:
  • Internal and external validity measures
  • Similarity functions for various different data types
  • PCA and PCoA
  • Mixture models and expectation maximation
  • Modern clustering algorithms
Literature
    Meddeles ved kursets start.


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
  1. Mandatory assignments and presentation of one or more scientific articles presented in class. Pass/fail, internal marking by teacher.
Assessment and marking:
  1. Oral exam. External marking, Danish 7-mark scale. (5 ECTS). Allowed aids  Blackboard/Whiteboard.
Expected working hours
The teaching method is based on three phase model.
Intro phase: 24 hours
Skills training phase: 12 hours, hereof:
 - Tutorials: 12 hours

Educational activities
  • Study of modern clustering algorithms based on scientific papers.
  • Discussions and small projects conducting cluster analyses.
Educational form
The course will consist of frontal lectures supported by discussion sessions. The students will get accompanying exercises demonstrating the collected knowledge on practical real-world problems. The student activation is completed by a mandatory presentation of a related scientific paper.

Language
This course is taught in English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.