Ausschreibung

Lehre MNM Team Projekte Publikationen en

. Home . Lehre . Studentische Arbeiten . Masterarbeiten . Ausschreibung

Data has become a critical factor in many scientific disciplines, e.g. in the areas of fusion energy (ITER), bioinformatics (meta-genomics), climate (Earth System Grid), and astronomy (LSST). The volumes of data produced by these scientific applications are increasing rapidly driven by decreasing costs for data generation, communication, compute and storage. Conventional methods of capturing, managing and processing data are bound to fail with increasing data volumes, rates and dynamism. Further, low-level abstractions and HPC architectural paradigms, such as the separation of storage and compute, hinder progress.

The aim of this master thesis is to evaluate different high-level data processing tools (such as Apache Spark and Apache Flink) on HPC and cloud infrastructures. A particular challenge to investigate is the ability to handle data streaming on HPC infrastructure. In this thesis, we develop and extend a set of tools for supporting data processing and streaming applications on HPC and clouds. Further, we will investigate the performance characteristics of different scientific applications and algorithms on different infrastructure configurations. Based on this characterization, we plan to develop a performance model that aids the understanding of the runtime behavior of these applications.

Requirements:

Knowledge of Python or another object-oriented language
Knowledge of HPC and distributed computing technologies
Previous experience with data processing frameworks (e.g. Spark)

Aufgabensteller:
Prof. Dr. D. Kranzlmüller

Dauer der Arbeit:

Masterarbeiten: 6 Monate

Anzahl Bearbeiter: 1

Betreuer:

Dr. Andre Luckow, andre.luckow@gmail.com

Funktionen

Characterization of Data-intensive Applications, Abstractions and Middleware on Heterogeneous Infrastructure