DM555: Data mining and statistical learning (10 ECTS)

STADS: 15016301

Level
Bachelor course

Teaching period
The course is offered in the spring semester.

Teacher responsible
Email: vandinfa@imada.sdu.dk

Timetable
Group Type Day Time Classroom Weeks Comment
Common I Tuesday 08-10 U146 6-8,10-13,15-16,18-22
Common I Thursday 14-16 U146 6-8,13,16,18-19,21-22
Common I Friday 10-12 U142 12
Common I Friday 12-14 U146 20
H1 TE Wednesday 12-14 U142 6-7,22-23
H1 TE Wednesday 08-10 U142 10,12-13
H1 TE Wednesday 08-10 U143 11
H1 TE Wednesday 10-12 U147 15-19,21
H1 TE Wednesday 08-10 U146 20
H1 TE Friday 10-12 U142 10
H1 TE Friday 12-14 U146 11
Show entire timetable
Show personal time table for this course.

Comment:
Ubegrænset deltagerantal.

Prerequisites:
None

Academic preconditions:
The content of DM507 Algorithms and Data Structures and DM527 or DM535 or DM549 Discrete Methods for Computer Science assumed known. Familiarity with some programming language or platform is recommended.

Course introduction
Data Mining and Statistical Learning are key technologies in the analysis of large datasets, and in many financial, medical, commercial, and scientific applications. They enable computational systems to identify meaningful patterns in the data and to adaptively improve their performance with experience accumulated from the observed data.

This course introduces the most common techniques for performing basic data mining and statistical learning tasks, and covers the basic theory, algorithms, and applications. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. For most of the techniques presented, a formal computational description is provided in addition to the basic ideas and intuition. Moreover, the students have the opportunity to experiment and apply data mining and statistical learning techniques to selected problems.

Qualifications
At the end of the course the students will have the following competencies:

  • knowledge of common data mining and statistical learning tasks and methods
  • application of common data mining and statistical learning methods to real world problem
  • design of data mining and statistical learning methods
Expected learning outcome
At the end of the course, the student should be able to:

  • describe the data mining and statistical learning tasks presented during the course
  • describe the algorithms and methods presented in the course
  • describe the topics presented in the course in precise mathematical language
  • understand and argue the individual steps of mathematical derivations presented in class
  • apply the methods to simple problems
  • apply the methods to situations different from the ones presented in class
  • reflect on and assess design choices for data mining and statistical learning systems
  • undertake experimental evaluation of data mining and statistical learning methods and report the results
Subject overview
Basic probability; tail bounds; feasibility of learning; error and noise; training vs testing; theory of generalization; the linear model; overfitting; neural networks; regularization; validation; support vector machines; statistical hypothesis testing; itemsets and association rules mining.

Literature
    Meddeles ved kursets start.


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None

Assessment and marking:
  1. Oral exam, partly based on mandatory assignments. Evaluated by Danish 7-mark scale, external examiner (10 ECTS). (15016302)

During the course four homeworks are assigned to the students. Together with selected topics from the course, these homeworks form the basis for an oral exam at the end of the course. The final grade will be based on an overall impression of the student's performance in the five elements which are part of the evaluation. The solutions of the four homeworks will be made available to the examiner.

Re-examination in the same semester or immediately thereafter. Re-examination is an oral exam, graded according to the 7-mark scale, external censorship.



Expected working hours
The teaching method is based on three phase model.
Intro phase: 40 hours
Skills training phase: 30 hours, hereof:
 - Tutorials: 30 hours

Educational activities Study phase: 30 hours

Language
This course is taught in Danish or English, depending on the lecturer. However, if international students participate, the teaching language will always be English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.