Summer intern positions during summer 2019

We have several 3 month summer intern positions available for Bachelor’s or Master’s degree students from University of Helsinki or other Finnish universities for summer 2019. Please contact me if interested.

Apply here by 5 February 2019.

We have several topics available. The topics can be adjusted according individual preferences of the students. Many of our previous summer interns have continued as master’s degree students in our group after the summer internship.

We have several topics under general title “Human-in-the-loop interpretable AI methods”, with me and our postdocs supervising any students (see below for examples of the topics).

Additionally, we offer one topic with a title “Developing data mining and AI algorithms for climate research: quantification of biosphere-aerosol-cloud interactions and feedbacks” jointly with the Institute for Atmospheric and Earth System Research (INAR), co-supervised by Profs. and Tuomo Nieminen and Kai Puolamäki.

Sub-topics in “human-in-the-loop interpretable AI methods”

Understanding and exploring black box AI algorithms with simulations

Modern artificial intelligence and machine learning techniques are very efficient and accurate on high-dimensional datasets. As a drawback, the methods are often black box models which are impossible for a human to interpret, which has led to the field of interpretable models, i.e., techniques that allow us to understand how models work. How can we explain complex AI models? Given that a model has learned relevant structures in the data, how can one study and use this information for other purposes?

This project requires a mathematical background, good understanding of machine learning and a good working knowledge of statistics.

Supervisors: Dr. Andreas Henelius, Dr. Emilia Oikarinen, Prof. Kai Puolamäki

Generating physical machine learning emulators

There are many physical models used for research and operational purposes (such as weather and climate models) that are based on first principles of physics, but they are generally computationally expensive. To overcome the problem, machine learning methods such as generative adversarial networks can be used to model some of the sub-modules in order to speed up the overall computational burden. This work will be hosted at Department of Computer Science, but in collaboration with Institute for Atmospheric and Earth System Research (INAR) and Finnish Meteorological Institute (FMI).

This topics requires understanding of machine learning methods and ability to implement the methods. Knowledge of deep learning and of physical sciences is a plus.

Supervisors: Dr. Martha Zaidan, Prof. Kai Puolamäki (CS), Prof. Risto Makkonen (FMI/INAR), Dr. Michael Boy (INAR)

Open source tools for randomization and exploratory data analysis

Visual exploration of high-dimensional datasets is a fundamental task in exploratory data analysis (EDA). We have developed a theoretical model for EDA, where patterns already identified and considered known by the user are input as knowledge to the exploration system. The user is hown views of the data where the user’s knowledge has been taken into account. Based on our recent work in EDA and randomization methods, the tasks in this project are twofold.

Implement an open-source tool for exploratory data analysis. The tool should be web-based, cross-platform, and scale to large datasets. Develop an open source library (e.g., in R, Python, JavaScript) implementing modern randomization methods for the use of data mining. Examples of such randomisation techniques include for instance maximum entropy models and different constrained randomisation schemes.

These tasks require good programming skills. Previous experience of open-source software development and knowledge of R, Python, JavaScript, HTML5, and frameworks such as React, Angular or Node.js are considered an advantage.

Supervisors: Dr. Andreas Henelius, Prof. Kai Puolamäki

Interpretable “white box” models, especially in environmental sciences

Scientists have gathered numerous amount of data and perform data analysis in order to improve our understanding about world natural phenomena. One of the research task is to develop a proxy or model to describe a particular scientifically process. In addition to prediction, a proxy can also be used to understand the relationship between variables involved in that proxy. Although modern statistical methods, such as machine learning and deep learning, are already capable to create a proxy which higher accuracy, but these methods are considered as black box methods, where the relationship between variables are not transparent and hence are not understandable. In this project, the goal is to explore a variety of white box models and investigate their use as data-driven approaches. Case studies can be taken from environmental sciences in collaboration with Institute for Atmospheric and Earth System Research (INAR).

These tasks require good mathematics, statistics and programming skills. Previous experience in Linux, R, Python and other programming languages are considered an advantage. Experience in high performance computing would also be very much appreciated.

Supervisors: Dr. Martha A. Zaidan, Prof. Kai Puolamäki

Finding subjectively interesting patterns using constraint-based reasoning

Over the last decade there has been a growing interest in using generic constraint-based reasoning and optimization approaches (e.g., Boolean satisfiability, constraint programming, answer set programming, and linear programming) to solve data mining and machine learning tasks such as pattern mining, correlation clustering and learning optimal Bayesian networks. In this project the student will work in developing constraint reasoning based methods for data analysis that are controlled by the humans, for instance, in terms of exact and efficient search for subjectively interesting patterns in data.

A background in mathematics and interest in research is useful; prior knowledge of constraint-based reasoning methods (e.g., SAT, CP, ASP, LP), data analysis, machine learning, statistics and good programming skills are considered an advantage.

Supervisor: Dr. Emilia Oikarinen