BMB819: Optimizing data analysis with R scripting (5 ECTS)

STADS: 01009801

Level
Master's level course

Teaching period
The course is offered in the spring semester.

Teacher responsible
Email: veits@bmb.sdu.dk

Timetable
Group Type Day Time Classroom Weeks Comment
Common I Monday 10-12 U35 05,08-09
Common I Tuesday 10-12 U142 08
Common I Wednesday 16-18 U10 06
Common I Wednesday 16-18 U10 08
Common I Thursday 16-18 U14 06
Common I Thursday 12-14 V10-412-2 08
Common I Thursday 14-16 U7 09
Common I Friday 10-12 U17 05
S1 TE Tuesday 12-15 V10-412-2 08
S1 TE Tuesday 12-14 U35 09-11
S1 TE Wednesday 16-18 U35 10-11
S1 TE Thursday 14-16 V10-412-2 08
S1 TE Thursday 16-18 U35 09
Show entire timetable
Show personal time table for this course.

Prerequisites:
None

Academic preconditions:
basal experience of statistics

Course introduction
Research in modern biochemistry and microbiology involves so-called high-throughput experiments. These experiments aim to measure maximal amounts of molecules of a certain type. The types are classified into " -omes", such as transcriptomes (RNA) derived from microarrays or proteomes (proteins) derived from mass spectrometry. The large data amounts generated by the experiments require appropriate treatment in order to get optimal insight into the investigated biological system. Noisy measurements, bias from sample preparation as well as general complexity in interpreting the experimental results can be overcome by setting up software pipelines that deal with these problems at several stages. Often established software modules, that are easy to use without preceding training, cannot be used for novel experimental designs and new technologies. As consequence, many experimentalists need to consult an expert such as a bioinformatician to deal with even simple tasks. This bottleneck often leads to longer times to finish a project and adds uncertainty to the results due to lack of knowledge about proper treatment of the experimental data. This course aims to fill this gap by introducing the students to the main concepts of  data analysis. The course will have a theoretical and a practical part, with the objective to provide general understanding of data analysis and its application to real data sets.

Among a large number of available programs, the R scripting language became very popular to deal with high-throughput data, as it (i) allows adaptation of the analysis to any experimental design, (ii) offers simple commands to operate on entire data sets, (iii) provides a wide range of tools for data visualization, (iv) is open source and (v) has a large and active community of researchers developing new tools. However, it requires the user to acquire scripting skills to take advantage of the large number of features.
The course will introduce the students to basic programming of R scripts, data visualization and basic statistical models necessary to deal with data from modern high-throughput experiments. The course involves practical exercises on real data sets that might come from experiments previously carried out by the student, e.g. during their master's thesis.

Qualifications
Creation of simple programs for data analysis; statistical assessment of large data sets; ability to create high-level graphics

Expected learning outcome
The students should be able to independently analyze their own data sets. The gathered knowledge includes working with large data sets and carrying out standard statistical analysis to identify relevant features. Furthermore, they will learn how to objectively discuss applied data analysis methods presented e.g. in publications. This course aims to offer general abilities that might be extended to follow a career in bioinformatics.

Subject overview

  1. Programming
    - General concepts of software implementation
    - Understanding the R framework
    - Basic programming of R scripts
    - Use of operators for calculations on arrays and matrices
    - Data conversion, string manipulation
  2. Bioinformatics / data manipulation
    - Data normalization
    - Detection of statistically relevant features
    - Multivariate analysis
    - Usage of graphical facilities
  3. Biostatistics
    - Basic statistics
    - Visualization methods
    - Data modeling
    - Statistical tests
Literature
There isn't any litterature for the course at the moment.

Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None

Assessment and marking:
  1. Individual excercise. External examiner, graded after Danish 7 mark scale (5 ECTS)
  2. Excercises. Pass/fail, internal evaluation by teacher.

Reexamination in the same exam period or immediately thereafter



Expected working hours
The teaching method is based on three phase model.
Intro phase: 20 hours
Skills training phase: 16 hours, hereof:
 - Tutorials: 16 hours

Educational activities Study phase: 10 hours

Language
This course is taught in English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.