DM555: Data mining and statistical learning (10 ECTS)

STADS: 15016301

Level
Bachelor course

Teaching period
The course is offered in the spring semester.

Teacher responsible
Email: zimek@imada.sdu.dk

Timetable
Group Type Day Time Classroom Weeks Comment
Common I Monday 14-16 U156 36-41,43-50
Common I Wednesday 10-12 U156 36-37
Common I Thursday 12-14 U156 38,41,43-46,48-50
H1 TE Wednesday 10-12 U156 38-41,43-51
H1 TE Thursday 12-14 U156 36-37,39-40,47,51
Show entire timetable
Show personal time table for this course.

Comment:
Ubegrænset deltagerantal.

Prerequisites:
None.

Academic preconditions:
Students taking the course are recommended to:
  • Have knowledge of the basic concepts of discrete methods for computer science
  • Have knowledge of basic algorithms and data structures
  • Have knowledge of the basics of probability theory as it can be grasped while the course is running, eg, from the initial part of the course DM551, Algorithms and Probability
  • Be able to program


Course introduction
The aim of the course is to enable the student to choose and use techniques from Data Mining and Statistical Learning, which is important in regard to being able to analyse large datasets in many financial, medical, commercial, and scientific applications.

Data Mining and Statistical Learning techniques enable computational systems to identify meaningful patterns in the data and to adaptively improve their performance with experience accumulated from the observed data.

This course introduces the most common techniques for performing basic data mining and statistical learning tasks, and covers the basic theory, algorithms, and applications. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. For most of the techniques presented, a formal computational description is provided in addition to the basic ideas and intuition. Moreover, the students have the opportunity to experiment and apply data mining and statistical learning techniques to selected problems.

The course builds on the knowledge acquired in the courses DM507, Algorithms and Data Structures, and DM527 or DM535 or DM549, Discrete Methods for Computer Science, and gives an academic basis for studying other elective topics such as bioninformatics and for conducting bachelor and master thesis projects as well as other practical oriented study-activities, that are part of the degree.

In relation to the competence profile of the degree it is the explicit focus of the course to:

  • Give the competence to design of data mining and statistical learning methods
  • Give skills to apply of common data mining and statistical learning methods to real world problem
  • Give knowledge of common data mining and statistical learning tasks and methods
  • Give knowledge to understand and reflect on theories, methods and practices in the computer science field
  • Give skills to acquire new knowledge in an effective and independent manner and be able apply this knowledgein a  reflective way
  • Give skills to describe, analyze and solve computer science problems applying methods and modeling formalisms from the core area and its mathematical support disciplines
  • Give skills in analyzing the advantages and disadvantages of various algorithms, especially in terms of resource consumption
  • Give skills to make and justify professional decisions
  • Give skills to describe, formulate and communicate issues and results to peers, non specialists, project partners and users.


Expected learning outcome
The learning objectives of the course are that the student demonstrates the ability to:
  • Describe the data mining and statistical learning tasks presented during the course
  • Describe the algorithms and methods presented in the course
  • Describe the topics presented in the course in precise mathematical language
  • Understand and argue the individual steps of mathematical derivations presented in class
  • Apply the methods to simple problems
  • Apply the methods to situations different from the ones presented in class
  • Reflect on and assess design choices for data mining and statistical learning systems
  • Undertake experimental evaluation of data mining and statistical learning methods and report the results
Subject overview
The following main topics are contained in the course:
  • feasibility of learning;
  • error and noise;
  • training vs testing;
  • theory of generalization;
  • the linear model;
  • overfitting;
  • neural networks;
  • regularization;
  • validation;
  • support vector machines;
  • statistical hypothesis testing;
  • itemsets and association rules mining.
Literature
    Meddeles ved kursets start.


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None.

Assessment and marking:
  1. Oral exam, partly based on mandatory assignments. Evaluated by Danish 7-mark scale, external examiner (10 ECTS). (15016302)

During the course four homeworks are assigned to the students. Together with selected topics from the course, these homeworks form the basis for an oral exam at the end of the course. The final grade will be based on an overall impression of the student's performance in the five elements which are part of the evaluation. The solutions of the four homeworks will be made available to the examiner.

Re-examination in the same semester or immediately thereafter. Re-examination is an oral exam, graded according to the 7-mark scale, external censorship.



Expected working hours
The teaching method is based on three phase model.
Intro phase: 40 hours
Skills training phase: 30 hours, hereof:
 - Tutorials: 30 hours

Educational activities Study phase: 30 hours
Educational form
Activities during the study phase:
  • Solve the homeworks
  • Reading from text book
  • Applications of the acquired knowledge to practical projects


Language
This course is taught in Danish or English, depending on the lecturer. However, if international students participate, the teaching language will always be English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.