BMB819: Optimizing data analysis with R scripting (5 ECTS)
STADS: 01009801
Level
Master's level course
Teaching period
The course is offered in the spring semester.
Teacher responsible
Email: veits@bmb.sdu.dk
Timetable
Group |
Type |
Day |
Time |
Classroom |
Weeks |
Comment |
Common |
I |
Monday |
14-16 |
U105 |
5 |
|
Common |
I |
Monday |
10-12 |
U105 |
6,11 |
|
Common |
I |
Monday |
10-12 |
U23A |
7 |
|
Common |
I |
Monday |
10-12 |
U146 |
9 |
|
Common |
I |
Tuesday |
14-16 |
U146 |
11 |
|
Common |
I |
Wednesday |
12-14 |
U142 |
5 |
|
Common |
I |
Thursday |
12-14 |
U142 |
6-7 |
|
Common |
I |
Thursday |
10-12 |
U51 |
9 |
|
H1 |
TE |
Tuesday |
12-14 |
U155 |
11 |
|
H1 |
TE |
Wednesday |
10-12 |
U155 |
7 |
|
H1 |
TE |
Wednesday |
12-14 |
U11 |
9 |
|
H1 |
TE |
Friday |
12-14 |
U10 |
5 |
|
H1 |
TE |
Friday |
10-12 |
U157 |
6-7 |
|
H1 |
TE |
Friday |
12-14 |
U23A |
9 |
|
H1 |
TE |
Friday |
12-14 |
U155 |
11 |
|
Show entire timetable
Show personal time table for this course.
Prerequisites:
None
Academic preconditions:
basic experience of statistics
Course introductionResearch in modern biochemistry and microbiology involves so-called high-throughput experiments. These experiments aim to measure maximal amounts of molecules of a certain type. The types are classified into " -omes", such as transcriptomes (RNA) derived from microarrays or proteomes (proteins) derived from mass spectrometry. The large data amounts generated by the experiments require appropriate treatment in order to get optimal insight into the investigated biological system. Noisy measurements, bias from sample preparation as well as general complexity in interpreting the experimental results can be overcome by setting up software pipelines that deal with these problems at several stages. Often established software modules, that are easy to use without preceding training, cannot be used for novel experimental designs and new technologies. As consequence, many experimentalists need to consult an expert such as a bioinformatician to deal with even simple tasks. This bottleneck often leads to longer times to finish a project and adds uncertainty to the results due to lack of knowledge about proper treatment of the experimental data. This course aims to fill this gap by introducing the students to the main concepts of data analysis. The course will have a theoretical and a practical part, with the objective to provide general understanding of data analysis and its application to real data sets.
Among a large number of available programs, the R scripting language became very popular to deal with high-throughput data, as it (i) allows adaptation of the analysis to any experimental design, (ii) offers simple commands to operate on entire data sets, (iii) provides a wide range of tools for data visualization, (iv) is open source and (v) has a large and active community of researchers developing new tools. However, it requires the user to acquire scripting skills to take advantage of the large number of features.
The course will introduce the students to basic programming of R scripts, data visualization and basic statistical models necessary to deal with data from modern high-throughput experiments. The course involves practical exercises on real data sets that might come from experiments previously carried out by the student, e.g. during their master's thesis.
QualificationsCreation of simple programs for data analysis; statistical assessment of large data sets; ability to create high-level graphics
Expected learning outcomeThe students should be able to independently analyze their own data sets. The gathered knowledge includes working with large data sets and carrying out standard statistical analysis to identify relevant features. Furthermore, they will learn how to objectively discuss applied data analysis methods presented e.g. in publications. This course aims to offer general abilities that might be extended to follow a career in bioinformatics.
Subject overview
- Programming
- General concepts of software implementation
- Understanding the R framework
- Basic programming of R scripts
- Use of operators for calculations on arrays and matrices
- Data conversion, string manipulation
- Bioinformatics / data manipulation
- Data normalization
- Detection of statistically relevant features
- Multivariate analysis
- Usage of graphical facilities
- Biostatistics
- Basic statistics
- Visualization methods
- Data modeling
- Statistical tests
LiteratureThere isn't any litterature for the course at the moment.
Website
This course uses
e-learn (blackboard).
Prerequisites for participating in the exam
None
Assessment and marking:
- Individual excercise. External examiner, graded after Danish 7 mark scale. (5 ECTS).
- Excercises. Pass/fail, internal evaluation by teacher.
Reexamination in the same exam period or immediately thereafter
Expected working hours
The teaching method is based on three phase model.
Intro phase: 20 hours
Skills training phase: 16 hours, hereof:
- Tutorials: 16 hours
Educational activities
Study phase: 10 hours
Language
This course is taught in English.
Course enrollment
See deadline of enrolment.
Tuition fees for single courses
See fees for single courses.