English

DM843: Unsupervised Learning (5 ECTS)

STADS: 15016001

Level
Master's level course approved as PhD course

Teaching period
The course is offered in the spring semester.

Teacher responsible

Email: roettger@imada.sdu.dk

Timetable

Group	Type	Day	Time	Classroom	Weeks
Common	I	Wednesday	10-12	IMADA semi	5-11
Common	I	Thursday	08-10	IMADA semi	5-11
H1	TE	Tuesday	08-10	IMADA semi	6-11

Show entire timetable
Show personal time table for this course.

Comment:
Ubegrænset deltagerantal. Fælles undervisning med DM856.

Prerequisites:
None

Academic preconditions:
The course is co-taught with DM856. The course cannot be chosen by students, who has either followed, or has passed DM856.

Course introduction
One trend can be observed over almost all fields of informatics: we have to cope with an ever increasing amount of available data of all kinds. This amount of data renders it impossible to inspect the dataset "by hand", or even deduce knowledge from the given data, without sophisticated computer aided help.

The purpose of the course is to enable the student to apply and understand common unsupervised learning tasks, e.g., clustering to unknown datasets. The students will be able to interpret the results and unravel hidden structures in the datasets. The application focus will be on biomedical datasets but is generally relevant for a multitude of other fields coping with large datasets.

The course builds on the knowledge acquired in the courses "Introduction to Programming" and “Algorithms and Probability” or similar and gives competences for master thesis work in the area.

In relation to the learning outcomes of the degree the course has explicit focus on:

Provide knowledge on a range of specialized models and methods developed in computer science based on the highest international research standards, including topics from the subject's research front.
Give knowledge of computer science models and methods for use in other professional areas.
Describe, analyse, and solve advanced computer scientific problems using the models they learned.
Shed light on stated hypotheses with a qualified theoretical basis and be critical of both own and others research results and scientific models.
Develop new variants of the learned methods, where the concrete problem requires it.
Disseminate research-based knowledge and discuss professional and scientific problems with both colleagues and non-specialists.
Plan and execute scientific projects of high standard, including managing work situations that are complex, unpredictable, and require novel solutions.

Expected learning outcome
The learning objectives of the course is that the student demonstrates the ability to:

Describe clustering algorithms and models covered in this course.
Describe proximity measures covered in this course and judge conditions when they should or should not be applied.
Describe cluster validity indices covered in this course and judge conditions when they should or should not be used.
Formulate the above in a precise language and notation.
Implement clustering algorithms, pre-processing steps, proximity measures and cluster validity indices covered in this course.
Perform an entire cluster analysis based on these implementations.
Judge the quality of an entire clustering pipeline, from data preprocessing, through the selection of an appropriate proximity measure, to the evaluation of the results.
Describe the implementation and experimental work in a scientific and precise fashion.

Subject overview

The following main topics are contained in the course:

Internal and external validity measures
Similarity functions for various different data types
PCA and PCoA
Mixture models and expectation maximation
Modern clustering algorithms

Literature

Meddeles ved kursets start.

Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam

Mandatory assignments and presentation of one or more scientific articles presented in class. Pass/fail, internal evaluation by teacher. (15016012).

Assessment and marking:

Oral exam. Is evaluated by external censorship by the Danish 7-mark scale (5 ECTS). Exam aids: Blackboard/whiteboard. (15016002).

Expected working hours
The teaching method is based on three phase model.
Intro phase: 24 hours
Skills training phase: 12 hours, hereof:
- Tutorials: 12 hours

Educational activities

Study of modern clustering algorithms based on scientific papers.
Discussions and small projects conducting cluster analyses.

Educational form
In the intro phase, concepts, theories and models are introduced and put into perspective. In the training phase, students train their skills through exercises and dig deeper into the subject matter.

The course will consist of frontal lectures supported by discussion sessions. The students will get accompanying exercises demonstrating the collected knowledge on practical real-world problems. The student activation is completed by a mandatory presentation of a related scientific paper.

Language
This course is taught in English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.