English

BMB819: Optimizing data analysis with R scripting (5 ECTS)

STADS: 01009801

Level
Master's level course

Teaching period
The course is offered in the spring semester.

Teacher responsible

Email: veits@bmb.sdu.dk

Timetable

Group	Type	Day	Time	Classroom	Weeks
Common	I	Monday	10-12	U35	05,08-09
Common	I	Tuesday	10-12	U142	08
Common	I	Wednesday	16-18	U10	06
Common	I	Wednesday	16-18	U10	08
Common	I	Thursday	16-18	U14	06
Common	I	Thursday	12-14	V10-412-2	08
Common	I	Thursday	14-16	U7	09
Common	I	Friday	10-12	U17	05
S1	TE	Tuesday	12-15	V10-412-2	08
S1	TE	Tuesday	12-14	U35	09-11
S1	TE	Wednesday	16-18	U35	10-11
S1	TE	Thursday	14-16	V10-412-2	08
S1	TE	Thursday	16-18	U35	09

Show entire timetable
Show personal time table for this course.

Prerequisites:
None

Academic preconditions:
basal experience of statistics

Course introduction
Research in modern biochemistry and microbiology involves so-called high-throughput experiments. These experiments aim to measure maximal amounts of molecules of a certain type. The types are classified into " -omes", such as transcriptomes (RNA) derived from microarrays or proteomes (proteins) derived from mass spectrometry. The large data amounts generated by the experiments require appropriate treatment in order to get optimal insight into the investigated biological system. Noisy measurements, bias from sample preparation as well as general complexity in interpreting the experimental results can be overcome by setting up software pipelines that deal with these problems at several stages. Often established software modules, that are easy to use without preceding training, cannot be used for novel experimental designs and new technologies. As consequence, many experimentalists need to consult an expert such as a bioinformatician to deal with even simple tasks. This bottleneck often leads to longer times to finish a project and adds uncertainty to the results due to lack of knowledge about proper treatment of the experimental data. This course aims to fill this gap by introducing the students to the main concepts of data analysis. The course will have a theoretical and a practical part, with the objective to provide general understanding of data analysis and its application to real data sets.

Among a large number of available programs, the R scripting language became very popular to deal with high-throughput data, as it (i) allows adaptation of the analysis to any experimental design, (ii) offers simple commands to operate on entire data sets, (iii) provides a wide range of tools for data visualization, (iv) is open source and (v) has a large and active community of researchers developing new tools. However, it requires the user to acquire scripting skills to take advantage of the large number of features.
The course will introduce the students to basic programming of R scripts, data visualization and basic statistical models necessary to deal with data from modern high-throughput experiments. The course involves practical exercises on real data sets that might come from experiments previously carried out by the student, e.g. during their master's thesis.

Qualifications
Creation of simple programs for data analysis; statistical assessment of large data sets; ability to create high-level graphics

Expected learning outcome
The students should be able to independently analyze their own data sets. The gathered knowledge includes working with large data sets and carrying out standard statistical analysis to identify relevant features. Furthermore, they will learn how to objectively discuss applied data analysis methods presented e.g. in publications. This course aims to offer general abilities that might be extended to follow a career in bioinformatics.

Subject overview

Programming
- General concepts of software implementation
- Understanding the R framework
- Basic programming of R scripts
- Use of operators for calculations on arrays and matrices
- Data conversion, string manipulation
Bioinformatics / data manipulation
- Data normalization
- Detection of statistically relevant features
- Multivariate analysis
- Usage of graphical facilities
Biostatistics
- Basic statistics
- Visualization methods
- Data modeling
- Statistical tests

Literature
There isn't any litterature for the course at the moment.

Website
This course uses e-learn (blackboard).

Prerequisites for participating in the exam
None

Assessment and marking:

Individual excercise. External examiner, graded after Danish 7 mark scale (5 ECTS)
Excercises. Pass/fail, internal evaluation by teacher.

Reexamination in the same exam period or immediately thereafter

Expected working hours
The teaching method is based on three phase model.
Intro phase: 20 hours
Skills training phase: 16 hours, hereof:
- Tutorials: 16 hours

Educational activities Study phase: 10 hours

Language
This course is taught in English.

Course enrollment
See deadline of enrolment.

Tuition fees for single courses
See fees for single courses.