Course Manual

This page is based on the Osiris course text for Processing Complex Data.

Name

Processing Complex Data

ECTS

2.5

Course Description

Contrary to what most introductory data science courses and statistics courses teach and use, data in science has an incredible variety of formats, sizes, and procedures. From simple tables to complex multidimensional space-time arrays, including metadata and custom storage formats, the world of data for science is vast, varied, and wildly interesting.

This course is designed to give students an introduction to core real-world data concepts, as well as hands-on experience with handling, processing, and modelling different types of complex data used in various fields of science and beyond. The course leans on student engagement and guided practical group work to create a dynamic learning environment.

Course Goals

At the end of the course, students will be able to:

Identify and describe a range of complex scientific data formats, such as multidimensional arrays, spatiotemporal data, and metadata-rich structures, and their associated challenges.
Apply appropriate preprocessing techniques, such as cleaning, transformation, normalization, filtering, and feature extraction, to different types of real-world scientific datasets.
Analyze and compare statistical modelling approaches suitable for various data modalities, evaluating their assumptions, strengths, and limitations.
Defend and communicate statistical findings through a structured report and peer discussions, demonstrating the ability to justify methodological choices and respond to critique.

Assessment

Assessment is based on a group project, which runs for the duration of the course. The grade for the project is the final grade for the course.

Materials

All course materials will be made openly available under a CC-BY license. The readings will be based on books, articles, and other sources which are openly available.

Wickham, H., Cetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). O’Reilly Media. https://r4ds.hadley.nz/
Several relevant open-access articles and materials.

Theme

DIGITA_DATA_INFORMAT

Work Form

Hoor/werkcollege

General Course Information

The course is structured around four weekly hackathons.

There is one hackathon per week.
Attendance is mandatory. If you are unable to attend a lecture or hackathon session, inform the course coordinator in advance.
If you miss a session, for example due to sickness, you should catch up in the regular way: read the assigned materials, go through the lecture slides, work through the relevant project tasks, ask your peers if you have questions, and, after doing the above, ask the teacher for further explanation.

To pass the course, you need to:

Participate in the group project. Groups are created during the first hackathon. If you miss that session, you will not be able to participate in the group project and will fail the course.
Attend all four hackathons.
Complete the final presentation and written report.

Fraud and Plagiarism

Plagiarism and fraud are serious academic offenses. Plagiarism is the use of another person’s work without proper acknowledgment. This includes copying and pasting text from generative AI, the internet, books, or other students. If you use text from another source, you must put it in quotation marks and provide a citation. If you do not, you are committing plagiarism.

Fraud is the use of dishonest methods to gain an unfair advantage. This includes copying another student’s work, submitting work that is not your own, or submitting the same work for two different courses. If you commit fraud or plagiarism, you will fail the course. If you are not sure what constitutes plagiarism or fraud, see the UU fraud and plagiarism policy.

Use of Generative AI

This course follows Scenario B of the UU GenAI index. You may use generative AI to prepare the work you hand in, but you may not use generative AI to produce the assignment that you hand in, except for copy-editing. You may also use AI tools to help generate code that produces reproducible datasets.

The use of generative AI, such as ChatGPT, in the group assignment is allowed only for:

Creating code to download and analyze data, or to explain code.
Labeling data.
Copy-editing text, meaning making the text more readable without changing the content.

The use of generative AI must be clearly indicated in the assignment, including a link to the full conversation with the tool, either using the share function in the tool or by exporting the conversation to an online document.

Copyright and Course Materials

The materials in this course are generated by FSBS teaching staff, who hold the copyright. The intellectual property belongs to Utrecht University.

Warning: There is no information in these materials that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.

You may use all content in this course, excluding staff names and datasets, as input to generative AI tools, provided that the content is not used for further training of the model.

If you do not know how to prevent the use of the content for further training of the model, you should not use any course materials as input for the AI tool. The same applies if you are not absolutely certain that the content is not used for further training of the model.