DM79: Algorithms for Web Indexing and Searching (7.5 ECTS)

STADS: 1505541

Level


Teaching period

To be announced.

Teacher responsible
Email: rolf@imada.sdu.dk

Timetable
There is no timetable available for the chosen semester.

Prerequisites:
None

Academic preconditions:
Data Structures and Algorithms (DM02) must be known.

Course introduction
The Web holds vast amounts of information. Searching efficiently in this body of information is challenging due to its size and lack of structure. The purpose of this course is to give an introduction to the methods underlying modern search engines, and to present further examples of computer science research relating to the Web.

During the course, the participants will implement a search engine. This will take place in work groups of somewhat larger sizes than usual for course projects, in order to gain experience with cooperation and project management in sizable software projects.

Expected learning outcome
After the course, the student is expected to be able to:

- explain in details the algorithmic and mathematical methods behind the main components of a search engine for WWW, including the components for data collection, indexing, query answering, and ranking of results.
- list the statistical aspects of the web graph taught in the course.
- explain the web graph models taught in the course, and describe their properties.
- explain other algorithmic and mathematical methods taught in the course of relevance to analysis of the Internet, including measures for web page similarity.
- during execution of the above demonstrate precision in wording and in use of mathematics and logic, as well as ability to select central and important parts of the subject under discussion.
- construct a well-functioning prototype of a web search engine by implementing the algorithmic methods behind the main components of such a search engine, including the components for data collection, indexing, query answering, and ranking of results.
- in a clear and well-structured language document the work done and the design choices made, including a description of the main structure of the program and of the principles of the algorithmically and programming-wise central parts of the program.

Subject overview
The anatomy of a search engine: web crawling, indexing, ranking, query handling. Specific subjects include: Internet protocols, algorithms and data structures for textual data, handling of massive data sets, compression, and link-based ranking. Additional research subjects to be discussed include: classic information retrieval, clustering, graph models for the Internet, web caching, and applications of game theory on the Internet.

Literature

    Artikelsamling


Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None

Assessment and marking:
Oral examination with external examiner. Grades according to the 13-point scale. There will be a programming project constituting 2.5 ECTS of the total 7.5 ECTS. The project must be approved in order to take the exam. Examination only when the course has been taught.

Expected working hours
The teaching method is based on three phase model.

3 timers forelæsninger og/eller eksaminatorier pr. uge.
Educational activities

Language
This course is taught in English, if international students participate. Otherwise the course is taught in Danish.

Remarks
The programming project is more comprehensive than usual mandatory programming projects in computer science.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.