Course Materials for an Introduction to Data-Driven Chemistry

Data-Driven Chemistry is a course aimed at undergraduate students in chemistry with no prior knowledge of programming and programmatic data analysis. It is designed as a 10-week-long course,1 introducing Python programming and its usage in data analysis typically required for a chemistry degree. The course consists of 10 units designed to be used in a blended learning environment of live coding and explanations, followed by a set of in-course tasks to be solved individually or through pair programming.


Statement of Need
The modern world is digital, allowing for upscaling and automation of chemical processes through robots, but also enabling the fast production of large-scale datasets.Data analyses carried out with Graphical User Interfaces (GUIs) or spreadsheet-based tools are often limited in robustness, speed, and reproducibility.Programmatic solutions fare much better in this context, but programming is not typically taught as a skill across chemistry degrees, unlike in physics, mathematics, or even biology (White et al., 2022).Both the Royal Society of Chemistry ("Employability Skills," n.d.) and the American Chemical Society (Neiles & Mertz, 2020) have identified good computational skills as key for graduate employability (Hill et al., 2019).Our course is designed to address this gap in the undergraduate chemistry curriculum at the University of Edinburgh and to ensure that chemistry graduates remain competitive with other STEM graduates.The material is made available as open source, with the hope that it may be used in other educational settings.
In recent years, programming has been integrated into chemistry degrees as a course on Mathematics for Chemists (Hutchison, 2021).While this approach provides a good foundation of programming, students are often left with few applications or examples relevant to their specific degree.There are excellent resources for self-study through books (Tanemura et al., 2022) as well as more general material for self-study of Python programming.Some material exists for a general introduction to programming and data analysis with a focus on, for instance, physical chemistry (Baptista, 2021), analytical chemistry (Menke, 2020), or machine learning for chemistry (Lafuente et al., 2021).However, little material is available for complete novices that combines teaching the basics of Python programming with how it can be applied to data in physical, inorganic, analytical, and even organic chemistry.The presented course fills this gap.

Target Audience
The course is aimed at early-year undergraduate students in chemistry, either first or second year, with little or no programming background in Python or other languages.The cohort size is typically around 100 students.During the first lecture, the 2022/23 cohort was asked the question: "Do you know how to code?" Overwhelmingly (62%), students replied with "I have no prior coding experience," while an additional 30% replied -"I only have some basic Python or coding experience."Only one respondent answered that they were confident in the use of Python.
By the end of the course, students should be proficient in using Python to: • Break a problem into logical steps and use loops and decision operations to solve tasks; • Perform numerical operations such as vector algebra and calculate simple statistics on data sets; • Read and clean experimental data, visualize the data, and draw appropriate conclusions from the data through simple statistical analysis; • Fit models to numerical data and present results in a clear and well-documented manner; • Write readable, well-documented short snippets of code for data analysis, making use of functions, loops, and conditionals.

Content
The course is structured similarly to the PCP Notebooks of Müller & Rosenzweig (2022).Data-Driven Chemistry consists of 10 Units, each designed as a 3-hour workshop session, either in-person or online.Additional tasks are provided for completion after the workshop sessions.A summary of each unit can be found in Table 1 below: The content is grouped into three main parts.Unit_01 to Unit_04 introduce concepts around algorithmic thinking and Python syntax, including variables, loops, functions, libraries, documentation, how to get help, and how to read files.These were largely adapted from Plotting and Programming in Python.Unit_05 to Unit_07 introduce concepts from SciPy (Virtanen et al., 2020), NumPy (Harris et al., 2020), Matplotlib (Hunter, 2007) and Pandas (The pandas development team, 2022) to carry out basic statistical analysis and plotting of chemistry-related data.Our strategy was to incorporate as many chemistry topics or techniques already familiar to students while teaching new Python content.For example, we assume that students have already studied mathematical concepts such as fitting data and comparing distributions, but now are presented with a dataset relevant to their degree.The domain-specific twist aims to boost student motivation to engage with these mathematical concepts.Therefore, Unit_08 to Unit_10 cover specific application examples from different areas of chemistry, and some of the applications directly tie into the students' lab experiments (e.g., UV-Vis spectroscopy and NMR data).To ensure sufficient student support, ten teaching assistants are available at each 3-hour long workshop attended by around 100 learners.Furthermore, 1-hour long Q&A sessions with the teaching assistants were organised on alternate weeks.

Assessment and feedback
The course was formally assessed at the University of Edinburgh using nbgrader (Jupyter Project et al., 2019).It was important to initially test the students formatively with weekly online quizzes, which could be completed multiple times.This gave students instant feedback on their performance, and allowed them to improve.In later weeks, the course was assessed summatively.However, we still made use of informal feedback within the sessions with built-in quizzes in the Jupyter Notebooks using Mentimeter and an associated Python Class.We polled students to test their understanding of the material, to promote critical thinking, and check their background knowledge.We also used Mentimeter to gather feedback after each session, which helped us to improve the material further.Generally, the usage of embedded quizzes helped with engagement from students.Figure 1 shows an example of a Mentimeter quiz.

Conclusion
We present a modular course to teach Python for chemistry undergraduate students, targeted at complete novices.We hope it is of value to other chemistry students and educators.Running the material through CoLab removes all installation requirements, making the course more easily accessible to novices, from students in guided university settings to other chemistry enthusiasts.

Figure 1 :
Figure 1: Example of how a Mentimeter poll can be directly embedded into a Jupyter notebook using the provided Mentimeter class.

HP:
Contributed material to Unit_05 and Unit_08, and helped edit the manuscript.RS: Created the material for molecular geometries forming part of Unit_06 and gave feedback on the manuscript.

Table 1 :
Summary of course material.
Authors are listed in alphabetical order.James Cumby (JC), Valentina Erastova (VE), Claire L. Hobday (CLH), and Antonia Mey (ASJSM) have been teaching this course at the University of Edinburgh since the academic year 2021/22.Jasmin Güven (JJG) and Hannah Pollak (HP) have been course demonstrators.Rafał Szabla (RS) taught one unit and created content for it in 2020/21, when the course was run in a shortened form as a replacement for physical chemistry laboratory practicals during the pandemic.Material was adapted from Matteo T. Degiacomi (MTD), who shared content developed in 2018 for his course at Durham University aimed at chemistry research students.He made some additional contributions beyond his original material.Specific contributions by each author are as follows: JC: Created material for Unit_01, Unit_10, and the helper_functions, gave feedback on other materials, and helped edit the manuscript.MTD: Contributed material for Unit_03, Unit_05, Unit_07, and Unit_08, and helped edit the manuscript.VE: Created material for Unit_05 and Unit_08, contributed to Unit_06, and helped edit the manuscript.
JJG: Contributed material to Unit_09, and helped edit the manuscript.CLH: Created material for Unit_03 and Unit_04, and helped edit the manuscript.ASJSM: Created material for Unit_02, Unit_06, Unit_07, and Unit_09, provided feedback and small contributions to most other units, and wrote the manuscript.