Summer jobs for students available!

Want to learn how to use the methods of data science to tackle data-rich natural sciences? Welcome to the seminar on…

Data Science for Natural Sciences

Seminar course DATA20014, 5 cr, Assoc. Prof. Kai Puolamäki (

Master’s Programme in Data Science & Master’s Programme in Computer Science, University of Helsinki. Available to students from other degree programmes.

In Exactum B120 (Pietari Kalmin katu 5, Helsinki) on
Thursdays at 10:15–12:00, from 17 January to 2 May 2019 (not every Thursday).


Many of the natural sciences are data rich. The data can come either via measurements, such as large-scale atmospheric measurements or physics experiments, or from computational simulations. Often, supervised machine learning methods are used to process the data. For example, regression functions, including those utilising deep learning methods, are used to calibrate measurements or classifiers are used to classify physics events.

There are several useful toolboxes and methods for processing data and implementing advanced supervised learning schemes. However, the situation is often far from satisfactory. Interesting questions include:

  • The data often contains artefacts which may affect the analysis results, and which should therefore be taken into account already when preprocessing the data. By which methods can such data preprocessing be done?
  • How can the expert user find relevant and surprising features of the data?
  • What are the principles and practices of implementing supervised learning methods in analysing actual data sets?
  • Powerful supervised learning methods are often essentially “black boxes”, i.e., it is difficult to understand their logic. How can the expert user understand the principles by which the supervised learning methods work? How can we control the statistical errors and concept drift of supervised learning methods, i.e., can we trust our predictions?
  • Which tools are best suited for a particular purpose?

The course covers themes such as described above. The course includes topics related to atmospheric sciences but it is not limited to it; the methods and techniques discussed are general and applicable across multiple domains. The final content will be decided together with the students.

There course contains introductory lectures. The students will write brief review articles of given topics which are then peer-reviewed by other students, revised by comments, and presented. In addition to reviewing the state-of-the-art one of the purposes of the course is to identify unsolved problems which could be addressed in later research.

Further information