DM824: Disk-Based Indexing (5 ECTS)

STADS: 15008101

Level
Master's level course

Teaching period
The course is offered in the autumn semester.
The course is offered when needed.

Teacher responsible
Email: foula@imada.sdu.dk

Timetable
There is no timetable available for the chosen semester.

Comment:
AFLYST E2010!

begrænset deltagerantal. Kurset kører i 1. kvartal.

Prerequisites:
None

Academic preconditions:
The contents of the course DM507 Algorithms and Data Structures should be known.

Course introduction
In an increasing number of applications the amount of data that need to be accessed and processed is often too massive to fit completely into main memory. EXamples include data collections in astronomy, healthcare, insurance, meteorology, finance, web search engines social networks, ect. In such applications, the amount of data is measured in terabytes or petabytes and have to be stored in and afficiently retrieved from, secondary memory (i.e. magnetic disks, flash memory and/or tapes).
General purpose data management systems, whose main focus is the reliable storage and efficient retrieval of the different kinds of data, have extensively employed external memory data structures, termed indexes, in order to achieve fast retrieval.
If we consider the abundance of different data types (e.g. spatial data, timeseries data, multimedia data, semi-structured data, graph data, and text data) as well as the disparate retrieval require´ments that may arise (e.g. OLTP-style data processing, ad-hoc data analysis, GIS functionality or keyword-type search) we conclude that providing the aforementioned functionality in its generality is a challenging task. The task is only becoming harder when the requirement of pertaining the history of the data (e.g. through data versions) is also included.

Expected learning outcome
At the end of the course, the student shouls be able to:
- Describe the types of data that are usually supported within a general purpose data management system, and for each of them discuss the most common retrieval requirements in terms of queries posed over such data and/or front-end applications.
- For each of the data types and retrieval tasks that have been covered in class describe the state-of-the-art access methods that support those tasks and argue about their generality, performance, and scalability. - Be able to decribe the additional challenges of persisting and querying different versions, and discuss solutions for the data types that have been covered in class.
- Describe and compare the different index integration architectures that been employed within general purpose data management systems.

Subject overview
In this course, we will study the various access methods (i.e. indexing schemes coupled with relevant retrieval APIS and processing algorithms) that are usually supported from a general purpose data management system /e.g. numerical, high-dimensional, temporal, spatial, text, graph and free-text data). We will also discuss the various data mangament systems.

Literature

    Meddeles ved kursets start.


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None

Assessment and marking:
Project assignment, Danish 7 mark scale, external examiner
<br< The re-exam is an oral exam, grades on the Danish 7 mark scale, external examiner.

Expected working hours
The teaching method is based on three phase model.

Forelæsninger: 28 timer
Educational activities

Language
This course is taught in Danish or English, depending on the lecturer. However, if international students participate, the teaching language will always be English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.