Germany-UK-Israel Call for students
Helmholtz, The British Council and Israel Data Science and AI Initiative - Exchange Program 2023 Call for Students
Get to work with data scientists from Helmholtz’ leading research centers and data scientists from British universities on interesting projects in our exchange program.
What is the exchange program?
The purpose of this Summer Exchange is to bring together data science talent from associated institutions to work together on predefined research projects. These projects should advance the participant’s skill-set in research, information and data science methods. Principal investigators (PI) shall welcome the other institution’s participants in their labs and to their teams, and ensure that they have the necessary support and resources to succeed in their projects. The participating researchers shall share credit for the results of the project. This experience at each other’s institutions shall serve as the foundation for ongoing scientific collaboration between Helmholtz centers and Israel DS centers.
The exchange program is open to Bachelor and Master students, as well as to doctoral researchers and postdocs from DS centers affiliated with IDSAI. At Helmholtz and the Universities from the UK the program is open to doctoral researchers and postdocs.
What can participating data scientists expect?
-Interesting and challenging data science problems that expand the participants’ skill-sets and knowledge of methodologies
-Experienced scientists that mentor and support their professional development in data science methods and their applications.
This exchange program is planned to take place in summer 2023 for six weeks
-You will be traveling to your host in either Germany or the UK, spend time working with them on a project in their labs for six weeks onsite.
The Israeli universities, through their Data Science research centers and faculty research funds, will cover living expenses for their participants’ six-week stay at a HIDA lab or at the UK hosting Labs. In return, HIDA finances Helmholtz participants’ stays at the DS centers in Israel.
Interested?
Join our Q&A session on April 18 on 17:00 and learn more about the program.
How do I take part?
In order to participate as a data scientist, please choose a project from the list below:
Helmholtz Projects
Click on the ' + ' to learn more about the respective topics, the mentors and the conditions of participation.
The group is interested in developing new models and methods to simulate multiphase flows at industrial scales. Since a few years a morphology-adaptive two-fluid model for multiphase flows (OpenFOAM-Hybrid) is developed by the group for the open source library OpenFOAM. The model is based on a 4-field approach and distinguishes into continuous phases that share a resolved interface and dispersed phases with statistically modelled interfaces.
https://www.hzdr.de/db/Cms?pNid=121&pOid=65149
What is the project's research question?
Visualisation techniques with Paraview and Nvidia Optix for OpenFOAM Hybrid simulations, which combines resolved and statistically modelled interfaces
What data will your exchange student work on?
What tasks will the project involve?
What makes this project interesting to work on?
The project introduces to numerical methods for multiphase flows. It allows getting insight into one of the largest open-source projects for numerical simulations (OpenFOAM) and one of the most used visualisation tools (Paraview). The used data represents real world applications and problems from chemical and process engineering industry (distillation column, cyclone separator). Furthermore, state-of-the-art computer vision and imaging techniques will be used (NVIDA Optix on NVIDIA Quattro RTX 4000) to generate an illustrative showcase video.
What is the project's expected outcome?
Contribution to software,Social Media (requires permission by HZDR)
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
OpenFOAM, Paraview, NVIDIA Optix
What skills are necessary for this project?
High-performance computing,Computer vision and image processing/analysis,
Interested candidates should be at Bachelor, Master.
Who would be supervising your exchange student?
Dr. Richard Meller (r.meller@hzdr.de) and Benjamin Krull (b.krull@hzdr.de)
The “Materials Design” department in the Institute for Hydrogen Technology at Hereon develops innovative functional materials based on metal hydrides and light metal hydride composites for applications in energy technology, such as for hydrogen storage. We combine computational methods with experimental techniques, in order to obtain a full understanding of all relevant processes in these materials from the atomic to the macro scale. Recently, the group started to incorporate data driven methods for speeding up the computational workflows and bridging the length-scales of the simulations.
https://www.hereon.de/institutes/hydrogen_technology/materials_design/index.php.enhttps://aidos.ml
What is the project's research question?
Which ML method is the best, i.e. most efficient, accurate, and practical, for training force fields for metal-hydrogen systems?
What data will your exchange student work on?
The aim of a force field is to predict the energy, forces, and/or stresses of an atomic configuration. You will work on data that we obtain from density-functional theory (DFT) calculations, which are reasonably accurate for this task, but also computationally demanding. Depending on your background, additional sampling of data can be part of the project or we will provide data sampled with molecular dynamics simulations based on DFT. Therefore, prior knowledge of materials modelling techniques is not required, but very basic materials science knowledge would be helpful for understanding the data.
What tasks will the project involve?
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Will be made available with publication.
What infrastructure, programs and tools will be used? Can they be used remotely?
You will have access to our modern in-house high-performance-computing (HPC) cluster for training and testing the models (see https://www.hereon.de/central_units/research_infrastructure/cluster/equipment/index.php.en for more information). The datasets are created from materials simulations using a combination of the Vienna Ab-initio Simulation Package (VASP) and python scripts based on the Atomic Simulation Environment (ASE) package. For the ML part, you are free to use your code of choice. All of these resources can be accessed remotely, if an in-person stay is not possible.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining,Machine learning,Deep learning,High-performance computing,Python
Interested candidates should be at Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me
The "Matter under Extreme Conditions" department at the Center for Advanced Systems Understanding (CASUS) at Helmholtz-Zentrum Dresden-Rossendorf focuses on advancing physics-informed and data-driven numerical simulation techniques to achieve multi-scale modeling. Our goal is to connect fundamental quantum processes on the microscopic level with observables in materials science on larger scales. To accomplish this, we employ a combination of first-principles methods such as density functional theory, efficient reduced methods like average-atom models and quantum hydrodynamics, and large-scale molecular dynamics simulations. We integrate these techniques into data-driven workflows with machine-learning methods, and also develop novel machine-learning techniques inspired by quantum computing.
https://www.casus.science/research/matter-under-extreme-conditions/
What is the project's research question?
The main question to be addressed is how to improve the performance — namely, the speed and the memory requirements — of the atoMEC code. The code should be extensively tested, particularly for edge cases, to ensure that any performance gains do not compromise accuracy or reliability
What data will your exchange student work on?
Mostly, the student will work with data generated by the code itself. AtoMEC is an average-atom code for studying materials under extreme conditions, meaning high material densities and/or temperatures. The data of interest is a mixture of physical observables, such as pressure or conductivity, and intermediate data, such as electronic wave functions. This intermediate data is often very large, and one of the aims of the project is to process and store this data more efficiently. Depending on how the project develops, the student may also work with data from density-functional theory codes, to benchmark the atoMEC results.
What tasks will the project involve?
What makes this project interesting to work on?
From a scientific perspective, average-atom codes are widely-used to model inertial confinement fusion. There have recently been many exciting breakthroughs in inertial confinement fusion (see for example https://www.bbc.com/news/science-environment-63950962 or https://www.newscientist.com/article/mg25634090-100-can-a-slew-of-nuclear-fusion-start-ups-deliver-unlimited-clean-energy/), which makes nuclear fusion an increasingly promising method for the generation of abundant clean energy. The atoMEC code is still young and currently maintained by a small team of developers at the Center for Advanced Systems Understanding; however, it is unique in that (as far as we know) it is the only open-source average-atom code under active development. The student therefore has the opportunity to make a significant impact on the atoMEC code, and as a result, the nuclear-fusion community more generally.
What is the project's expected outcome?
Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Scientific computation, data mining,Software development,Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Timothy Callow, t.callow@hzdr.de
The department for Imaging and Data Science focusses on the use of high-resolution 3D micro computed tomography to image the degradation and osseointegration of biodegradable magnesium implants in bone. Advanced image processing techniques and deep learning are used to attain quantitative parameters from image data. Moreover, new techniques are developed to enable the correlation and/or synthesis of different characeterization techniques.
stefanie.castell@helmholtz-hzi.de
What is the project's research question?
Can we use GANs to predict the histological staining of tissue based on X-ray tomography data to extract biologically relevant information?
What data will your exchange student work on?
We have a large number (>100) of 3D image volumes from micro computed tomography of rat bone samples implanted with four different implant types and corresponding 2D histology images. The images are registered, so that we have the exact slice in the (greyscale) tomographic volume which represents the (stained) 2D histology. This data will be used for training, testing and validation of the GAN. The aim is to train the GAN such that it can predict the staining for the whole 3D tomography volume. A prototype of this GAN trained on a limited number of samples is available.
What tasks will the project involve?
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? The data is available upon request as it's too large to be placed on conventional repositories.
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Machine learning, Deep learning, Computer vision and image processing/analysis,Python
Interested candidates should be at Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
The student will be supervised by myself, but also supported by other scientists working with deep learning and computed tomography data.
The team "Climate, Cohorts and PIA" as part of the Department for Epidemiology at the Helmholtz Centre for Infection Research focuses its research on digital epidemiology and real world data from methodologically sound population-based cohorts. We develop an eResearch System as product owners ("Prospective Monitoring and Management - App", PIA: www.info-pia.de) and take part in the largest epidemiological study Germany has ever conducted, the NAKO Gesundheitsstudie (www.nako.de). We built up a subcohort within NAKO for intensified digital syndromic surveillance of acute infectious diseases (https://www.helmholtz-hzi.de/en/research/forschungsprojekte/view/projekt/detail/zifco/).
stefanie.castell@helmholtz-hzi.de
What is the project's research question?
What are the pitfalls, challenges and solutions of processing and analysing real world epidemiologcal data?
What data will your exchange student work on?
We provide epidemiological overservational data involving longitudinal data collection (e.g. digital reporting of symptoms of acute respiratory infections) and - if available - complex laboratory data from immune cells and plasma of our study participants, Due to their longitudinal nature, the multitude of questionnaires, and the real-world missingness patterns, the data are complex and sometimes challenging to disentangle for sound scientific use.
What tasks will the project involve?
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? The data are sensitive (medical research data) and are not open source. The software PIA, however, is free and open source.
What infrastructure, programs and tools will be used? Can they be used remotely?
The student can work remotely. We use Zoom for meetings; jira, confluence and GitLab to structure our work; Rocket chat and Outlook to connect throughout the day, For data analysis, we work with R or Stata. If the student is present in the region in Germany (Brunswick/Hanover), he or she can e.g. visit our study centre to deepen his or her understanding of data collection processes.
What skills are necessary for this project?
Data analytics, statistics, Software development is optional, not necessary.
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me, In addition, for certain data Prof. F. Klawonn will support the student as well: https://www.helmholtz-hzi.de/en/research/research-topics/bacterial-and-viral-pathogens/biostatistics/our-research/frank-klawonn/
Our division is focused on understanding the impacts of human activity on the biogeochemical cycles of carbon and nitrogen and the exchange processes between ecosystems, the atmosphere, and the hydrosphere. We use a combination of field studies, observational data, and process modeling to study these impacts at multiple scales.
What is the project's research question?
Measuring plants effect on denitrification using innovative techniques
What data will your exchange student work on?
Our exchange student will be working on data collected from our new and special plant-soil mesocosm incubation system, which allows growth of plants in a N2-free atmosphere. The system continuously measures gas concentrations in the incubation vessels while they are being flushed with a He/O2 gas mixture, resulting in a large amount of data that needs to be converted into flux rates. This data will be used to study the effect of plants on denitrification processes and specifically to quantify N2, N2O, and CO2 emissions simultaneously.
What tasks will the project involve?
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining,Python,R
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me. I, as a PhD student (also Israeli, by the way) who is working with the new system, will be primarily responsible for supervising the exchange student working on this project. Additionally, the engineer who built the system and my supervisors will be available for guidance and support.
Dr. Michael Ulrich Dannenmann michael.dannenmann@kit.edu
We are researching data-driven methods for analysis of X-ray imaging data acquired at large-scale light sources like DESY or EuropeanXFEL. Loss of information during image acquisition renders the corresponding inverse imaging problem ill-posed (phase problem). We therefore deploy generative models (i.e. normalising flows and stable diffusion) that are solving these inverse imaging problems by recovering multiple solutions that correspond to an observation.
http://photon-ai-research.github.io/
What is the project's research question?
Is self-supervised learning producing reliable & robust features for image analysis?
What data will your exchange student work on?
You will be working on a large-scale COCO image dataset as well as 2d X-ray imaging data provided by HZDR/DESY.
What tasks will the project involve?
You will be implementing PyTorch datasets for COCO and X-ray imaging data. Then, you will be integrating data2vec into common workflows for image analysis based on two downstream tasks: 1) semantic segmentation (COCO) and 2) regression (X-ray data). Eventually, the performance of representations learnt by data2vec for solving these downstream tasks will be systematically assessed with respect to perturbations of the data.
What makes this project interesting to work on?
You will be experimenting with DataVec 2.0 of Facebook AI research, a visual feature learning scheme implemented in Python and PyTorch that is trained in a self-supervised fashion. Supervisory feedback for context-awareness of the feature encoder will be provided by instance-wise transformations such as 1) masking, 2) rotation, 3) intensity modulations. This is a state-of-the-art method that is used for very different applications and modalities ranging from images, speech to text. However, we will be focusing on images due to constraints in project time.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
You will be using HZDR's GPU cluster HEMERA where you get access to a partition with 7 nodes with 28 V100 GPUs. All coding will be done in Python and Pytorch. Distributed training, if appropriate, will be implemented via Horovod. Remote access is possible!
What skills are necessary for this project?
Deep learning, Computer vision and image processing/analysis,Python,PyTorch knowledge would be very helpful
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
I as well as my PhD student Erik Thiessenhusen will be supervising you.
Erik Thiessenhusen, e.thiessenhusen@hzdr.de
Among my group’s major research interests is the intensity, frequency, severity and duration of extreme events such as droughts, floods, heat and cold waves and their influence on socio-economic systems and public health. Studying extremes is an important task because they often have immediate impacts on society causing widespread adverse health outcomes and infrastructure destruction. We use statistical and climate modeling, data analysis and machine learning tools in order to understand which environmental extremes have the major negative impacts on human systems, and which protective factors (such as an extent of green residential spaces) could potentially moderate the effects of extreme weather events.
https://www.awi.de/en/about-us/service/expert-database/translate-to-english-monica-ionita.html
What is the project's research question?
One of the following research questions could be explored during the project: What are the associations between extreme weather events and adverse (negative) health outcomes? Which events could have the highest impact on health outcomes from historical perspective? Which protective factors could moderate the negative effects of extreme events?
What data will your exchange student work on?
European and Middle East observational and modeling temperature time series (e.g. E-OBS(spatial resolution 0.1°x0.1°), CRU TS4.04c (spatial resolution: 0.5°x0.5°), TerraClimate (sptial resolution: 0.5°x0.5°), ERA5 (sptial resolution: 0.1° x 0.1°)), as well as daily meteorological and health indicators from different federal agencies in Germany and Israel.
What tasks will the project involve?
Statistical data analysis with the potential to build predictive models. Specifically: correlation and association analysis, empirical orthogonal function analysis, canonical correlation analysis, machine learning methods (e.g., e.g., transformers, convolutional layers, variational autoencoders)
What makes this project interesting to work on?
From a broader perspective, this interdisciplinary project gives participant(s) an opportunity to explore the outcomes of environmental and medical studies, thus linking climate change and societal issues. Evidences from epidemiological studies show that extreme heat and cold events, droughts, wildfires as well as floods in Europe have negative impacts on population health. Furthermore, climate data reveals increased intensity, frequency, severity and duration of such extreme events, showing that societies are becoming more vulnerable to the increasing climate risks. The project helps to perform an in-depth association and causal analysis of both climate and health data at a country and international levels. This provides participant(s) with the possibility to extend knowledge and methodological background from different research fields.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software,A in-depth statistical analysis of the relationship between different type of extreme events (e.g. heat waves, heat stress, cold spells), and the variability and trends of health indicators at country level
Is the data open source? Climate data is open source while medical data needs special access
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Data analytics, statistics,Scientific computation, data mining, Machine learning, High-performance computing,Python,R
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me
Our research group studies hazards and related surface processes across various environments, from mountain regions to coastal areas and even deep oceans, over different time scales. Our work has wide topics ranging from earthquakes and tsunamis, storms and hurricanes, landslides and debris flow, to floods and paleo-flood. We use various tools and methods, including field surveys, remote sensing, environmental seismology methods, processes-based modelling, and data science methods, including machine learning to understand the physical processes behind all these natural hazards.
https://www.gfz-potsdam.de/en/section/earth-surface-process-modelling/overview/
What is the project's research question?
Landslides, debris flow, hyperconcentrated flow, and floods are among the most dangerous natural hazards worldwide. One of the fundamental tasks for geomorphologists is to classify and identify which kinds of processes they observe in the field. The task is more challenging than it sounds, especially considering high-damage processes like debris flows and landslides. Meanwhile, multiple dimensionless numbers (e.g., Reynolds number and Einstein number) based on first-principle physics have been widely used to describe these natural flows. When we use these dimensionless numbers and datasets to classify the flow, we automatically face a long-standing challenge in machine learning (maybe one of the biggest challenges in data science): the curse of dimensionality. One of the expertise for quantum machine learning methods (e.g., QSVM) is to deal with such a high-dimensional dataset. Therefore, we ask the central research question in geomorphology and machine learning: can we objectively define the type of natural flows using the dimensionless number and Quantum machine learning methods?
What data will your exchange student work on?
The applicant will work on a high-dimensional dataset for different surface flows. They are usually dimensionless in a physical sense based on the theoretical derivation and well-defined with uncertainties. Please see the following reference for details: Du, J., Zhou, G. G., Tang, H., Turowski, J. M., & Cui, K. F. E. (2023). Classification of Stream, Hyperconcentrated, and Debris Flow Using Dimensional Analysis and Machine Learning. Water Resources Research, 59, e2022WR033242. https://doi.org/10.1029/2022W
What tasks will the project involve?
What makes this project interesting to work on?
As part of our effort to build Digital Twins in catchment scale, the applicant will get the chance to learn not only about basic geophysics and geomorphology knowledge about hazards but also about state-of-the-art machine learning methods (dimensionless learning and physics-informed machine learning) as well as quantum computations. The successful applicants also will have a chance to interact with a large data science community around Berlin and Potsdam. In addition to the six weeks of funding provided by HIDA, our group (Hazards and Surface Processes) will be happy to provide matched funding for the successful applicants (i.e., another six weeks of funding) based on performance to make sure the applicant has the chance to integrate fully with the research institute and group members.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The successful applicant will mainly use the laptop we offered. We will provide chances to use HPC at GFZ or Quantum computer at IBM, depending on the availability of the computer. He or she will mainly use IBM Quantum tools Qiskit and Jupiter notebook to develop the code. In principle, all infrastructure, programs, and tools can be developed and used remotely. But we have a preference to work on-site, as we believe learning how to interact with other members in research group is a crucial part of training a successful researcher.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining,Machine learning, Deep learning, Geographic information systems,Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me
Clean drinking water for all people—this is one of the greatest future challenges. Innovative membrane methods can help with both water supply and wastewater disposal, thereby reducing the burden on the environment. We develop high-performance membranes that filter micropollutants and heavy metals from the water. We also design novel microporous polymers, for example, to desalinate seawater or to remove climate-damaging gases from the air. This work is supported by comprehensive digital modeling
https://hereon.de/institutes/membrane_research/microporous_polymers/index.php.en
What is the project's research question?
The general question is: Can we correlate the chemical structure and compostion of PIM molecules to the performance of the membranes prepared from these molecules?
What data will your exchange student work on?
What tasks will the project involve?
What makes this project interesting to work on?
The Hereon-Institute of Membrane Research provides an international and interdisciplinary work environment, which will be great experience for the student working on the project. Additionally, he or she can learn to apply data scrapping methods in a scientific project as well as gather new knowledge about material science. Therefore, the project is interesting for young data scientists, which want to apply their knowledge in material science. Also, the student can learn more about membranes used for various applications.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
Python will be used as coding language. Additionally, licenses for many scientific journals are available. A PC or laptop as well as office space will be provided by the Hereon. Additionally, we can arrange accommodation in one of our guesthouses. The work can be done remotely. However, onsite work is prefered.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining,Databases,Python
Interested candidates should be at Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
The work will be supervised by Sarah Glass (a postdoctoral researcher in the department).
Recent advances in Machine Learning (ML) and Deep Learning (DL) are revolutionizing our abilities to analyze biomedical images and deepen our understanding of infection and disease. Among other host-pathogen interactions may be readily deciphered from microscopy data using convolutional neural networks. We work on developing the latest ML/DL and Computer Science methods to facilitate our understanding of Infection Biology and Disease Biology.
https://ayakimovich.github.io/ https://www.casus.science/casus/team/
What is the project's research question?
Develop novel label-efficient annotation techniques for clinical brightfield microscopy of urine samples
What data will your exchange student work on?
Annotation of large clinical microscopy datasets is laborious and requires expert training. At the same time experts' time is often unattainable due to their primary roles as clinicians. This project will employ a clinical dataset of brightfield microscopy of patients’ urine with a few annotated samples obtained by our collaborators at the Royal Free hospital in London. We aim to develop a diagnostic phenotype quantification workflow using label-efficient machine learning approaches.
What tasks will the project involve?
The tasks of this project will involve 1) establishing the state-of-the-art self-supervised or weak-labelling learning method or architecture for object detection task in bright-filed microscopy 2) comparing their performance on the clinical (or so-called "wild") dataset 3) identification of the best performing method 4) testing the method performance on an unseen dataset.
What makes this project interesting to work on?
Urinary tract infections (UTI) belong to the most common clinically relevant bacterial infections (Murray et al. 2021). 1 in 3 women worldwide will have at least one UTI by 24 years of age and 40 - 50% of women will experience one UTI during their lifetime with 44% experiencing recurrences. Improving deep learning methods without increasing the need for annotation efforts will have a direct impact on the clinical outcomes of UTI patients.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source?the dataset will be published by the time the project begins
What infrastructure, programs and tools will be used? Can they be used remotely?
We will use HZDR Hemera GPU cluster to train the deep learning models used in this project. These tools can be used remotely. However, it will be essential to have the exchange student available for in-person work to ensure
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Parallel/distributed programming with GPU,Computer vision and image processing/analysis,Software engineering,Python
Interested candidates should be at Bachelor, Master, PhD.
Who would be supervising your exchange student?
Me. The student will be supervised jointly with Dr. Harry Horsley from the Department of Renal Medicine at the Division of Medicine of University College London.
Based at the Jülich Supercomputing Centre, the Sim Data Lab Astronomy and Astrophysics is a targeted research and support structure that provides an interface between the Supercomputer facilities in Jülich and the Astrophysics research communities. Our tasks include the support of Data Science Projects and High Performance Computing Simulations for the Astrophysics Community.
What is the project's research question?
Anomaly detection in massive datasets is one of the most common problems in data science. In this project, we will address the detection of outliers in the massive Chandra Source Catalogue (see below). Identifying outliers in a large data set is a prerequisite to focus investigations on a smaller set of objects with potentially unexpected properties. Previous searches for outliers in the Chandra Source Catalogue (see Swarm et al. 2022) successfully apply machine learning algorithms, but are constrained by the single-CPU memory limitation associated with libraries like scikit-learn. In this project, we will test the scalability of the memory-distributed tensor framework (e.g. Heat, see below) for outlier detection on Chandra data, with the goal of scaling out to even more massive datasets.
What data will your exchange student work on?
The dataset we will work on is extracted from the Chandra Source Catalogue v.2 (CSC2). Chandra is NASA’s big, multipurpose X-ray telescope that is in operation for more than two decades. CSC2 provides more than 500 measured or estimated properties for more than 300,000 X-ray sources. We will limit our study to sources with a high detection significance and to a limited subset of relevant summary properties of the different categories.
What tasks will the project involve?
The project involves applying machine learning algorithms available in the Heat framework (https://github.com/helmholtz-analytics/heat) to develop a pipeline for massively parallel outlier detection. You will work in close contact with the Heat dev team. Your code will be tested on a subset of the Chandra Source Catalogue to reproduce the findings of Swarm et al. (2022), and later applied to larger datasets. Supercomputing infrastructure will be available for testing.
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Data analytics, statistics,Scientific computation, data mining, Machine learning, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Joint supervision Astrophysics Sim Lab / Heat team
c.comito@fz-juelich.de Dr. Claudia Comito
Within our group, we study the emission, distribution, and degradation of pollutants. We use our mobile laboratory, high resolution atmospheric chemistry models and small sensors to monitor the distribution of pollutants in the urban environment. We develop new methods and operate the World Calibration Center for Nitrogen Oxides.
https://www.fz-juelich.de/en/iek/iek-8/research/reactive-trace-substances/energy-related-emissions
What is the project's research question?
The aim of the project is to link the pollutant concentrations measured at citizens' homes with their health data.
What data will your exchange student work on?
What tasks will the project involve?
What makes this project interesting to work on?
The project is a joint project between the Forschungszentrum Jülich, where the sensors are developed and characterized, the Helmholtz Institute Munich, the Open Knowledge Lab in Cologne and the Helmholtz Centre for Infection Research in Braunschweig, where the app with the health data questionnaire is maintained. The project is thus at the interface of epidemiological studies and atmospheric chemistry, and the student can gain experience in both subjects. The citizen science approach is particularly interesting, as citizens are also encouraged to participate and share their experiences in the data analysis.
What is the project's expected outcome?
Co-authorship to research paper, data quality of the sensor network
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Air quality data is stored on an SQL server in Jülich. The data will be analyzed with Python tools and IDL. The tools cann be operated remotely.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining,Computer vision and image processing/analysis,Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me. Students are welcome to apply their own analytical tools, such as machine learning, to the dataset.
The primary aim of the research unit is the development and employment of exposure assessment methods for applying in epidemiological studies on health effects of air pollutants. Main activities are focused on detailed physical and chemical characterization of ambient particles collected at an aerosol measurement stations and during the intensive field campaigns. The data is then used for investigating the health relevance of particulate matter.
simonas.kecorius@helmholtz-muenchen.de
https://www.helmholtz-munich.de/en/epi/research-groups/environmental-exposure-assessment
What is the project's research question?
What are the determinants of personal exposure to airborne polutants in the city?
What data will your exchange student work on?
The student will work on the air quality data set collected during intensive mobile measurement campaign in three German cities - Munich, Augsburg, and Regensburg. Specifically, data set comprises of aerosol particle and equivalent black carbon, as well gaseous pollutant number and mass concentrations, road traffic videos from onboard camera, and geo-spatial data (e.g. commuted routes; geo-location; road types; etc). Additionally, meteorological and air quality data from long-term monitoring sites will also be available for the project.
What tasks will the project involve?
What makes this project interesting to work on?
Impaired air quality due to vehicular emissions is one of the most important environmental factors contributing to premature deaths in Europe. Qualitative and quantitative exposure assessment of airborne pollutants is therefore of high importance for controlling air quality, reducing health risks, and improving life quality in general. Increasing our understanding about pollutant emission, dispersion, and its effects on personal exposure through the application of novel tools onto complex data sets is therefore both interesting and highly rewarding. In the course of this project, exchange student will increase his/her competence in complex environmental data analysis, work management, scientific writing and communication.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software,Material for the conference
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
To successfully accomplish project tasks, the applicant is expected to be familiar with some geographic information system (GIS. E.g. qgis) software, programming language for statistical computing and graphics (e.g. R, Python, etc.), as well the concepts for machine learning. All required tools are freely available online and will be provided by the host institution.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Computer vision and image processing/analysis,Geographic information systems,Python,R
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me. Exchange student will discuss the project results with the scientists from Environmental Exposure Assessment and Environmental Risk working groups.
Synthetic Aperture Radar (SAR) has many advantages over optical systems, including being independent of daylight and being able to penetrate clouds. My institute is concerned with radar on different levels, including space-borne and air-borne SAR missions. My lab builds the air-borne SAR system F-SAR which acquires high-resolution, multi-frequency, fully-polarimetric SAR images and creates algorithms to process and analyse SAR data to its fullest potential.
https://www.dlr.de/hr/en/desktopdefault.aspx/tabid-2326/
What is the project's research question?
How well do deep-learning based approaches general over remote sensing data acquired over different regions of the Earth?
What data will your exchange student work on?
The Copernicus program offers free public access to several European Earth observation satellites including the Sentinel 1 (SAR) and Sentinel 2 (Multispectral) constellations. We will exploit these missions and leverage both, SAR as well as multispectrial (i.e. optical) data.
What tasks will the project involve?
What makes this project interesting to work on?
This projects spans a large variety of highly relevant tasks providing deep insights into several hot research topics (curation of datasets, evaluation of machine learning, self-supervised learning) which would be impossible to cover in a tight time frame such as given in this exchange program. However, for each step there is already a lot of prior work that will be leveraged, keeping the actual workload at a minimum while maximizing the potential outcome. The student will learn something about SAR, i.e. an exciting sensor technology that is complementary to optical sensors, benchmarking, as well as deep learning and self-supervised learning.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software,Co-authorship in a dataset
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Copernicus open data hub (data access), SNAP (preprocessing), PyTorch/Tensorflow (deep learning). None of them is proprietary, all of them can be used locally (or even remotely).
What skills are necessary for this project?
Machine learning, Deep learning, Computer vision and image processing/analysis
Interested candidates should be at Master, PhD.
Who would be supervising your exchange student?
Me
We work on scalable solution-based deposition methods for halide perovskite solar cells making emerging PV module prototypes in close collaboration with Industry. We have as a team also been actively setting up a literature database for perovskite solar cells that we launched in 2022 in collaboration with the Isreali Startup MaterialsZone (more information regarding this project can be found here: https://www.perovskitedatabase.com). We are now looking to expand our collaborative activities with MaterialsZone to develop standardized data entry protocols in particular with respect to industrially-relevant PV prototype fabrication processes.
https://www.helmholtz-berlin.de/forschung/oe/se/hybrid-materials/index_en.html
What is the project's research question?
Structure-Property-Performance relationships in "historic" device data produced by slot-die coating in the HySPRINT Innovation Lab.
What data will your exchange student work on?
The goal is to consolidate data available in the published literature, which has been collected on the PerovskiteDatabase (www.perovskitedatabase.com) and also experimental solar cell data for solar cells and modules made by slot-die coating in our research laboratory.
What tasks will the project involve?
What makes this project interesting to work on?
Contribution to developing data management, sharing and dissemination platforms both in the OpenScience domain but also complementary with platforms used in commercial PV manufacturing sector.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? both
What infrastructure, programs and tools will be used? Can they be used remotely?
MaterialsZone data management platform - can be used remotely
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Databases, Python
Interested candidates should be at Bachelor, Master, PhD.
Who would be supervising your exchange student?
Me. Will be also directly involved in collaboration with MaterialsZone (Israel)
I am a theoretical mathematician, and climate model developer with the main focus of my research in numerical ocean modeling, and hydrodynamics. Apart from the main subject of research I am developing algorithms that could help communicate scientific data through musical signals. In other words, I aim to develop neural network models that will enable sonification (making sound out of external resources) of any ocean and climate data sets.
https://www.awi.de/ueber-uns/organisation/mitarbeiter/detailseite/vera-fofonova.html
What is the project's research question?
Our project explores the possibilities to use AI technologies in order to link scientific data and music composition. In particular, the project aims to explore and further develop sonification techniques that would help analyze, interpret and communicate scientific data (e.g. climate model outputs).
What data will your exchange student work on?
NetCDF climate model outputs (.nc), csv files of converted climate data, audio files in MIDI format
What tasks will the project involve?
What makes this project interesting to work on?
AI technologies are increasingly commonplace in academic inquiry and an emergent area of interest in contemporary music and sound-art practice. Music may be a valuable resource in capturing and transforming scientific information, whereas scientific data can become a background for creative works and stage performances. This project suggests a foundation for new creative works and helps convert complex scientific language into a form of music as a lingua franca to reach a wider audience. Project outputs will give new insights, approaches and processes for scientists and artists working together. The project involves collaboration with a contemporary musician from Edinburg (Michael Begg) and data scientists from Helmholtz AI network.
What is the project's expected outcome?
Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Computing resources for the Helmholtz AI community (HAIRCORE); Python and related to python modules and programs
What skills are necessary for this project?
Machine learning, Deep learning, Software development, Python, any additional programming languages will be of an advantage
Interested candidates should be at Master, PhD or Postdoc-level.
Who would be supervising your exchange student?
Me
Long duration human spaceflight missions create medical support challenges for eye changes, which can occur in nearly two-thirds of astronauts. To address these challenges, we are developing artificial intelligence applications to support crew members in monitoring their eyes. These applications have the potential to be used for crew medical support aboard the International Space Station, and beyond.
https://www.dlr.de/me/de/desktopdefault.aspx/tabid-1768
What is the project's research question?
Can artificial intelligence applications be used to provide crew medical support for ophthalmology during long duration spaceflight missions?
What data will your exchange student work on?
Medical image and video data collected from the human eye at various sites.
What tasks will the project involve?
Supporting the development of artificial intelligence models, using computer vision, convolutional neural networks, regression, classification, and object detection.
What makes this project interesting to work on?
The project would be interesting to work on because it has the potential to be used during future human spaceflight missions, and may help address medical concerns for exploration-class human spaceflight missions.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
For the machine learning component of our research, Python, convolutional neural networks, Tensorflow, GPU servers, and computer vision tools are used to conduct our analyses. To collect the raw image and video data used in our analysis, we use ophthalmology imaging tools (e.g., for fundoscopy and optical coherence tomography (OCT)) commonly used in clinical practice worldwide. You would require access to a development environment (e.g., VSCode, Pycharm, Colab), understanding of and adherence to data security and ethics standards, and a modern smartphone/tablet. Work could be done remotely.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Computer vision and image processing/analysis, Software development, Python
Interested candidates should be at , Master, PhD or Postdoc-level, We are looking for a person with experience in applying computer vision or machine learning to images and video data. Experience working with human medical data would be valuable.
Who would be supervising your exchange student?
Dr. med. Claudia Stern, Mr. Scott Ritter
claudia.stern@dlr.de, scott.ritter@dlr.de
UK Universities Projects
Click on the ' + ' to learn more about the respective topics, the mentors and the conditions of participation.
I am a lecturer in the department of Computer Science at University of Manchester especialized in Computer Graphics. Our lab focus is physically-based rendering to generate realistic images in virtual world given small samples of real world. This can be used to improved the realism in Metaverse, game and movies.
Zahra.montazeri@manchester.ac.uk
https://research.manchester.ac.uk/en/persons/zahra.montazeri
What is the project's research question?
How to accurately reproduce complex materials in virtual work given real photographs captured from real sample?
What data will your exchange student work on?
Thousands of images taken from a small piece of material (eg, cloth, metal, plastic) under different configuration of light and camera. The goal is to explore these images and map them to a continuous space.
What tasks will the project involve?
Using a sophisticated scanner, we scan a small piece of matterial and generate thousands of images to learn how the light intracts with the sample. The project involves studying the data and probably using learning techniques such as Neural Network to define continious space to reproduce those materials in virtual world.
What makes this project interesting to work on?
With the advancement of metaverse and the need for virtual world, reproducing realistic materials is crucial more than ever. Appearance Modelling is a field in Computer Graphics studies the techniques to bring realism to the virtual world using physics, math and programming. This project will improve all these skillsets and offer a high demanding applications once completed.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
We only need a powerful computer and it can be done remotely.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Software engineering, Python C/++/#
Interested candidates should be at Master, PhD or Postdoc.
Who would be supervising your exchange participant?
Dr Zahra Montazeri
We are a professor (the applicant, Caterina Doglioni), postdoctoral researchers and PhD students based at the University of Manchester, and this project is co-supervised by a PhD student at Lund University (Alex Ekman) who is part of the HELIOS graduate school.We are members of the ATLAS Collaboration at the LHC, and our research interests include searching for new physics phenomena that can be produced in proton-proton collisions, motivated by the presence of dark matter in our universe.Within the SMARTHEP European Training Network that we coordinate, we work on real-time analysis, machine learning and heterogeneous computing infrastructures, and we are keen on FAIR, sustainable and green software.
Caterina.doglioni@manchester.ac.uk
The website is a work in progress, but the student will be working within this network https://www.smarthep.org and with collaborators from https://www.heliosgraduateschool.org
What is the project's research question?
What data will your exchange student work on?
In the first instance, the student will work on Open Data recorded by experiments at the Large Hadron Collider, but we are also happy if the student comes with some data of their own because we are trying to see how this algorithm works for different disciplines.
What tasks will the project involve?
What makes this project interesting to work on?
This project is tackling a problem that is very common in big science and industry: how to enable recording more data when we have limited resources for doing so. There are many algorithms that work for image or music compression, but there aren’t yet many that compress complex scientific data with many different features. The other interesting aspect is to understand how lossy compression modifies our data, and what the tolerance is for researchers doing the data analysis.
What is the project's expected outcome?
Co-authorship to research paper & Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The main software we will use and modify in this project is called “Baler”, an open source compression tool undergoing development at the Particle Physics divisions of Lund University and the University of Manchester. Baler uses autoencoder neural networks as a type of lossy machine learning-based compression to compress multi-dimensional data and evaluate the accuracy of the dataset after compression. This software can be used remotely, and we can offer computing resources that are accessible remotely for use in this project.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Software development, Python
Interested candidates should be at Master, PhD or Postdoc
Who would be supervising your exchange participant?
Prof. Caterina Doglioni would be supervising the exchange student together with Alexander Ekman (Lund University) and Pratik Jawahar (University of Manchester). The student would be working in a team that also includes a Google Summer of Code student from the High Energy Physics Software Foundation.
Our group’s primary focus is provable neural training algorithms – which we have published at the top conferences and journals and multiple such works are under submission. Most recently we have also ventured into the mathematics of how PDEs can be solved by neural nets. We also have a number of ongoing experiments doing comparative tests between different neural methods of solving differential equations.
anirbit.mukherjee@manchester.ac.uk
https://sites.google.com/view/anirbit/home
What is the project's research question?
The exchange student will engage in developing theory at the interface of PDE solving and neural nets. In particular, we would try to (a) understand how the size of the net affects the ability to solve PDEs and (b) how parameters of a parametric PDE can be inferred by a neural net from the value of some solution of it at a few observation points. Albeit this would mostly be a mathematics project, the student could also choose to spend some of their time on doing experiments in these themes.
What data will your exchange student work on?
There is no externally sourced data that will be needed in the project.
What tasks will the project involve?
The project will necessarily involve the student developing a rigorous understanding of some of the recent papers where mathematical formalisms have been developed at this interface – like theory of DeepOperatorNetworks and Physics Informed Neural Nets. Then the student will be set to the task of proving the intended theorems for some simple PDEs.
What makes this project interesting to work on?
Its easy to infer the importance of this project from realizing that some of the main software companies are already investing heavily into developing codes associated to the kind of questions proposed here, https://microsoft.github.io/pdearena/ and https://developer.nvidia.com/modulus This project is intended to be at the cutting-edge of applied mathematics and deep-learning. To the best of my knowledge there are only a handful of groups around the world which are looking into the theory of why and how neural nets can solve or invert PDEs. So, via this project, the student has a rare chance to get an entry point into this exciting and futuristic direction of research that is poised to grow very big in the near future.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
What skills are necessary for this project?
Machine learning, Deep learning, Python
Interested candidates should be at Master, PhD or Postdoc.
Who would be supervising your exchange participant?
Dr Anirbit Mukherjee
Our group’s research focuses on difficult, real-world computational problems and the data related to them. We aim to conduct our projects end-to-end: from theoretical analysis to implementation, optimisation and finally experimental testing on realistic datasets. Currently we are working on counting problems in complex networks and packing problems of geometric data.
What is the project's research question?
What data will your exchange student work on?
The data consists of a large number of polygon shapes stored in flat text files.
What tasks will the project involve?
The student will develop, together with members of the lab, an algorithm to test the self-compatibility of a shape developed and use this implementation to annotate the dataset. They will then attempt to develop a machine-learning model to recognize self-compatible shapes efficiently.
What makes this project interesting to work on?
What is the project's expected outcome?
Co-authorship to research paper & Contribution to software
Is the data open source? Some of the data might be proprietary. If this is an issue we can limit ourselves to open-source data only.
What infrastructure, programs and tools will be used? Can they be used remotely?
We use Python for prototyping/scripting and Rust for computationally heavy algorithms.
The tools could be used remotely, but this project will need close supervision which is easier to provide if the students is present.
What skills are necessary for this project?
Machine learning, Python, Other: The student should be interested in mathematics (linear algebra in particular) and algorithms
Interested candidates should be at Master, PhD or Postdoc.
Who would be supervising your exchange participant?
Our group’s research focuses on difficult, real-world computational problems and the data related to them. We aim to conduct our projects end-to-end: from theoretical analysis to implementation, optimisation and finally experimental testing on realistic datasets. Currently we are working on counting problems in complex networks and packing problems of geometric data.
What is the project's research question?
Large complex networks are notoriously difficult to visualise which makes comparative analysis or classification by visual means very challenging. We would like to investigate whether simple, high-level visualisations of core network properties (degree distribution, density, number of high/medium/low degree vertices) can capture structurally interesting properties. In particular, we aim to classify networks according to the resulting visualisation and investigate whether this results in a useful classification method.
What data will your exchange student work on?
An existing corpus of complex networks stored in a uniform file format.
What tasks will the project involve?
The student will implement a program which takes a complex network as input and outputs a visualisation of high-level properties. We already have foundational ideas for the visualisation, but the student will have freedom to explore variations. The student will apply the final visualisation to the whole network corpus, group them by similarity (using clustering techniques) and analyse the resulting groups. Specifically, we are interested whether the visualisation captures the network’s origin domain.
What makes this project interesting to work on?
The project exemplifies the difficult work with high-dimensional, non-numerical data. The student will have an opportunity to learn about network properties, graph algorithms and sharpen their data visualisation skills.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Essentially, only Python (with deep learning frameworks based on `pytorch` etc.) and some compute resources will be required. All of these can be provided remotely.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Software development, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
Apply now until May 8th 2023.
Potential hosts will see your application form and will be able to invite you to take a part in their project.
What’s the timeline?
- Q&A session April 18th 17:00 – Register here.
- Participants are informed by end of May.
- Projects/exchanges can start anytime thereafter. Onsite visits can take place until end of March 2024.
Questions? Feedback? Send an email to idsi_admin@technion.ac.il
About the Helmholtz Information & Data Science Academy (https://www.helmholtz-hida.de/)
HIDA – The Helmholtz Information & Data Science Academy – is Germany’s largest postgraduate training network in information and data science. We prepare the next generation of scientists for a data-heavy future of research. HIDA connects and serves as the roof to 6 newly founded data science research schools linked by a network of 14 national research centers and 17 top-tier universities across Germany. By 2025, these data science research schools will train over 250 fully funded doctoral researchers. The doctoral researchers will deepen their knowledge in data science methods and learn to combine knowledge from the six Helmholtz research areas – energy, earth and environment, health, aeronautics, space and transport, matter, and information – with data science methods. For these purposes, all doctoral researchers receive dual supervision in data science and their scientific domain. In addition, HIDA offers doctoral researchers and scientists attractive opportunities to obtain training and continuing education in a wide range of methods and to become part of an international data science network.
About the UK Parties
About the British Council (www.britishcouncil.de)
The British Council is the UK’s international organization for cultural relations and educational opportunities.
We support peace and prosperity by building connections, understanding and trust between people in the UK and countries worldwide and have been working in Germany since 1959.
About SIN
The UK's Science and Innovation Network (SIN) promotes international collaboration on science and innovation.
The Science and Innovation Network (SIN) has approximately 120 officers covering around 60 countries and territories around the world building partnerships and collaborations on science and innovation.
SIN officers work with the local science and innovation community in support of UK policy overseas, leading to mutual benefits to the UK and the host country.