Funktionen

Print[PRINT]
.  Home  .  Lehre  .  Studentische Arbeiten  .  Bachelorarbeiten  .  ba_benchmark-dm-platform

Bachelor-Arbeit

Benchmarking of a Large-Scale Distributed Data Management Platform

Today's research is increasingly driven by data, and the reproducibility, accessibility, interoperability and searchability of research data are becoming essential to the research process.

The Research Data Management (RDM) team of the Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ) is part of different projects on national and European level, and in particular of the EU (Horizon 2020) project “Large-scale EXecution for Industry & Society” (LEXIS).

The LEXIS project is building an advanced platform for running Scientific Computing Workflows at the confluence of High-Performance Computing (HPC), IaaS Cloud and Big Data solutions. This can, e.g., involve preprocessing of weather-station-data, assimilation of this data in a large-scale weather simulation, and then visualizing the weather model output and giving flash-flood warnings.

Tasks within the LEXIS Pilot Workflows are executed at LRZ and IT4I (Czech National Supercomputing Centre). The aim of the project is to freely distribute such workflows over the most appropriate computing resources at both centres.

Thus, we need a common data platform, which manages access, storage and transfer of input and output data within the workflows. This platform is called the “LEXIS Distributed Data Infrastructure” (DDI). While the DDI is currently based on iRODS and EUDAT solutions, we would like to experiment with competing data-management systems in order to figure out optimisation potentials. Also, we would like to explore optimum data-transfer paths within the DDI infrastructure and to LRZ and IT4I HPC/Cloud Resources. In this thesis, you will benchmark parts of the infrastructure, devise additional measurement concepts as needed, interpret the results, and devise design recommendations for LEXIS and future similar projects.

Outline of this work:

  • Investigate different data management solutions (iRODS, Gluster, Min.io, etc.)
  • Setup automated tests to evaluate the different solutions, and data transfers on different ways with different methods (scp, gridftp, iRODS), based on multiple criteria.
  • Perform performance analysis and report back results and recommendations.

Requirements

  • Good knowledge of Linux
  • Experience in using Python
  • Familiarity with the docker environment
  • Knowledge of version control systems such as GIT

Aufgabensteller:
Prof. Dr. D. Kranzlmüller

Dauer der Bachelor-Arbeit: gemäß Studienordnung

Anzahl Bearbeiter: 1

Betreuer: