DM843: Unsupervised Learning (5 ECTS)
STADS: 15016001
Level
Master's level course
Teaching period
The course is offered when needed.
Teacher responsible
Email: roettger@imada.sdu.dk
Timetable
Group |
Type |
Day |
Time |
Classroom |
Weeks |
Comment |
Common |
I |
Monday |
16-18 |
IMADA semi |
14-19 |
|
Common |
I |
Tuesday |
10-12 |
IMADA semi |
14-19 |
|
Common |
I |
Tuesday |
11-13 |
IMADA semi |
20 |
|
Common |
I |
Wednesday |
16-18 |
U12 |
15 |
DM843 RR |
Common |
I |
Wednesday |
16-18 |
IMADA semi |
16-20 |
|
Common |
I |
Thursday |
11-13 |
IMADA semi |
20 |
|
Show entire timetable
Show personal time table for this course.
Comment:
Ubegrænset deltagerantal.
Prerequisites:
None
Academic preconditions:
Elementary probability theory and proficiency in programming.
Course introductionOne trend can be observed over almost all fields of informatics: we have to cope with an ever increasing amount of available data of all kinds. This amount of data renders it impossible to inspect the dataset "by hand", or even deduce knowledge from the given data, without sophisticated computer aided help. In this course we will discuss one of the most common mechanism of unsupervised machine learning for investigating datasets: Clustering. Clustering separates a given dataset into groups of similar objects, the clusters, and thus allows for a better understanding of the data and their structure. We discuss a number of clustering methods and their application to various different fields such as biology, economics or sociology.
Qualifications
- Independent identification of problems and challenges related to the analysis of a given dataset.
- Data-driven selection of appropriate tools, measures and procedures for performing a high quality cluster analysis even in unknown domains.
- Ability to judge the quality and applicability of a cluster analysis.
Expected learning outcome
- Describe clustering algorithms and models covered in this course.
- Describe proximity measures covered in this course and conditions when they should or should not be applied.
- Describe cluster validity indices covered in this course and conditions when they should or should not be used.
- Formulate the above in a precise language and notation.
- Implement clustering algorithms, pre-processing steps, proximity measures and cluster validity indices covered in this course.
- Perform a entire cluster analysis based on these implementations.
- Describe and explain an entire clustering pipeline, from data preprocessing, through the selection of an appropriate proximity measure, to the evaluation of the results.
- Describe the implementation and experimental work in a scientific and precise fashion.
Subject overviewInternal and external validity measures, similarity functions for various different data types, graphical cluster detection, hierarchical clustering, optimization based clustering, finite mixture models, popular clustering tools.
LiteratureMeddeles ved kursets start.
Website
This course uses
e-learn (blackboard).
Prerequisites for participating in the exam
Mandatory assignments and presentation of one or more scientific articles presented in class. Pass/fail, internal evaluation by teacher.
Assessment and marking:
- Oral exam. Is evaluated by external censorship by the Danish 7-mark scale (5 ECTS).
Expected working hours
The teaching method is based on three phase model.
Intro phase: 28 hours
Skills training phase: 12 hours, hereof:
- Tutorials: 12 hours
Educational activities
Language
This course is taught in English.
Course enrollment
See deadline of enrolment.
Tuition fees for single courses
See fees for single courses.