We consider the problem of merging *k* (*k* ≥ 2)
sorted sequences in parallel using *p*
processors (threads) operating on a random access shared memory machine.
Although there exists excellent prior work modern computer architectures
reveal a gap between theory and practice. The key to achieve high
performance is obviously to optimize *cache locality*.

Before discussing the problem in more detail let us simplify to the basic
case where *k* = 2. A
straightforward approach is to locate the median of the first sequence in
the second sequence and then merging both halves in parallel. This
algorithm is well described on Wikipedia.

For generalization to *k* > 2
there are basically two approaches. First, we can arrange all sequences in
a binary tree of height log *k*.
Then, each binary merge can be computed in parallel. A second approach is
to construct a binary *min heap* of identical height log *k*.
In each round we select the smallest element out of *k*
chunks. This is like playing several rounds in a tournament where the best
player gets kicked off after each round. This is why we call this a *tournament
tree* in literature.
The essential difference between these two algorithms is that in a binary
tree merge each element needs to be merged *O*(log *k*)
times while in a tournament tree the cost to insert a single element into
the tree is *O*(log *k*).

It is known that a binary tree merge is more efficient if we do it the “right” way, leading to the problem statement of this work. We need to understand what are the root causes of cache misses in this algorithm and how can we prevent them. The student has to apply both theoretical and experimental methods to accomplish this work. As a platform we can provide access to the SuperMUC NG system which is the world’s fastest general purpose computer.

If you know "The Art of Computer Programming" and Donald Knuth you are the right candidate with high probability. Otherwise, if the requirements below still apply to you it's time to learn cool stuff and getting insights.

**Anforderungen:**

- A great interest for computer science and algorithms.
- Basic knowledge of High Performane Computing.
- Basic knowledge of C/C++.

**Aufgabensteller:**

Prof. Dr. D. Kranzlmüller

**Dauer der Arbeit:**

- 6 Monate (Vollzeit) + Einarbeitungszeit

**Anzahl Bearbeiter:**
1 (bachelor or master student)

**Betreuer:**