The FAIR guilding principles  demand that research data repositories provide their services in a way that guarantees Findability, Acessibility, Interoperability and Reusability of the research data. Measure the compliance to these FAIR principles is an important task: Without it there is currently no acknowleged way to identify high quality research data services and promising trends in the field of research data management.Although the first formulation of the FAIR principles promised to make compliance to each "capital letter" measurable, some of them can only be calculated based on a use case: As an example, the demand that metadata should have "a plurality of accurate and relevant attributes" (principle R1) is hard to measure without a context that defines what is "accurate" and "relevant". The retrieval of spatially and temporally annotated images is an example for such a context. This retrieval is a use case that served as a basis for a benchmark. Its implementation allowed to score five research data repositories , but it only sheds light to a specific field of application and a specific tech stack used in its implementation.
There is the need for further use-case-centric benchmarks that allow to gain a better overview over the quality of service proliferation in the context of research data repositories. Such use cases should be relevant for several scientific fields and complex enough to face several technical challenges, while still specific enough to lead to a benchmark implementation. The implementation should be non-creative (only tools, standards and techniques that already exist may be used), automatable (the data integration must not include manual effort) and repeatable (might be repeated as often as necessary). There are a lot of standards, services and tools that are used in the context of research data handling - but not all are apt for every use case.
The goal of this thesis is the implementation of a use-case-centric benchmark (intergration of text-based data, such as csv files), its execution and the presentation of the calculated scores of the probed repositories. The use case should preferrably be applicable in the context of environmental sciences (e.g. sensor-based data as managed in the AlpEnDAC repository) or in the context of digital humanities (e.g. linguistic data as managed in the VerbaAlpina repository). An example would be the retrieval of all available temperature measurements for a specified range of time and for a specified location, but ideas from applicants are always welcome.
Executing the benchmark might necessitate computing resources exceeding the capacities of a normal desktop computer. In that case cloud computing capacities provided by the LRZ can be used.
Aufgabensteller: Prof. Dr. Dieter Kranzlmüller
Number of students: 1...n (depends on the proposed use case)