DM843: Unsupervised Learning (5 ECTS)

STADS: 15016001

Level
Master's level course

Teaching period
The course is offered when needed.

Teacher responsible
Email: roettger@imada.sdu.dk

Timetable
Group Type Day Time Classroom Weeks Comment
Common I Monday 16-18 IMADA semi 14-19
Common I Tuesday 10-12 IMADA semi 14-19
Common I Tuesday 11-13 IMADA semi 20
Common I Wednesday 16-18 U12 15 DM843 RR
Common I Wednesday 16-18 IMADA semi 16-20
Common I Thursday 11-13 IMADA semi 20
Show entire timetable
Show personal time table for this course.

Comment:
Ubegrænset deltagerantal.

Prerequisites:
None

Academic preconditions:
Elementary probability theory and proficiency in programming.

Course introduction
One trend can be observed over almost all fields of informatics: we have to cope with an ever increasing amount of available data of all kinds. This amount of data renders it impossible to inspect the dataset "by hand", or even deduce knowledge from the given data, without sophisticated computer aided help. In this course we will discuss one of the most common mechanism of unsupervised machine learning for investigating datasets: Clustering. Clustering separates a given dataset into groups of similar objects, the clusters, and thus allows for a better understanding of the data and their structure. We discuss a number of clustering methods and their application to various different fields such as biology, economics or sociology.

Qualifications
  • Independent identification of problems and challenges related to the analysis of a given dataset.
  • Data-driven selection of appropriate tools, measures and procedures for performing a high quality cluster analysis even in unknown domains.
  • Ability to judge the quality and applicability of a cluster analysis.
Expected learning outcome
  • Describe clustering algorithms and models covered in this course.
  • Describe proximity measures covered in this course and conditions when they should or should not be applied.
  • Describe cluster validity indices covered in this course and conditions when they should or should not be used.
  • Formulate the above in a precise language and notation.
  • Implement clustering algorithms, pre-processing steps, proximity measures and cluster validity indices covered in this course.
  • Perform a entire cluster analysis based on these implementations.
  • Describe and explain an entire clustering pipeline, from data preprocessing, through the selection of an appropriate proximity measure, to the evaluation of the results.
  • Describe the implementation and experimental work in a scientific and precise fashion.
Subject overview
Internal and external validity measures, similarity functions for various different data types, graphical cluster detection, hierarchical clustering, optimization based clustering, finite mixture models, popular clustering tools.

Literature
    Meddeles ved kursets start.


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
Mandatory assignments and presentation of one or more scientific articles presented in class. Pass/fail, internal evaluation by teacher.

Assessment and marking:
  1. Oral exam. Is evaluated by external censorship by the Danish 7-mark scale (5 ECTS).
Expected working hours
The teaching method is based on three phase model.
Intro phase: 28 hours
Skills training phase: 12 hours, hereof:
 - Tutorials: 12 hours

Educational activities

Language
This course is taught in English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.