We consider the problem of merging k (k ≥ 2) sorted sequences in parallel using p processors (threads) operating on a random access shared memory machine. Although there exists excellent prior work modern computer architectures reveal a gap between theory and practice. The key to achieve high performance is obviously to optimize cache locality.
Before discussing the problem in more detail let us simplify to the basic case where k = 2. A straightforward approach is to locate the median of the first sequence in the second sequence and then merging both halves in parallel. This algorithm is well described on Wikipedia.
For generalization to k > 2 there are basically two approaches. First, we can arrange all sequences in a binary tree of height log k. Then, each binary merge can be computed in parallel. A second approach is to construct a binary min heap of identical height log k. In each round we select the smallest element out of k chunks. This is like playing several rounds in a tournament where the best player gets kicked off after each round. This is why we call this a tournament tree in literature. The essential difference between these two algorithms is that in a binary tree merge each element needs to be merged O(log k) times while in a tournament tree the cost to insert a single element into the tree is O(log k).
It is known that a binary tree merge is more efficient if we do it the “right” way, leading to the problem statement of this work. We need to understand what are the root causes of cache misses in this algorithm and how can we prevent them. The student has to apply both theoretical and experimental methods to accomplish this work. As a platform we can provide access to the SuperMUC NG system which is the world’s fastest general purpose computer.
If you know "The Art of Computer Programming" and Donald Knuth you are the right candidate with high probability. Otherwise, if the requirements below still apply to you it's time to learn cool stuff and getting insights.
Prof. Dr. D. Kranzlmüller
Dauer der Arbeit:
Anzahl Bearbeiter: 1 (bachelor or master student)