Parallel sorting by regular sampling: analysis

Let \(p\) and \(n\) represent the number of processors and the number of data points to be used in the Parallel sorting by regular sampling algorithm. There are a number of complexities to be considered: run time, space used, and communication. The run time complexity should indicate the sequential time complexity, taking into account parallel computations. (The run time is not to be confused with the work which is the sum of the individual time complexities.) The space used is the total amount of space required across all processors to complete the computation. The communication time is the time required for the processors to move all messages during the computation. The communication complexity is a function of the network architecture. For this discussion, it is assumed that the underlying architecture is a single bus ethernet. Another assumption in this analysis is that all data sizes are basically uniform.

Questions

  1. What is meant by all data sizes are basically uniform?
  2. How reasonable is this assumption?
  3. What is a worst case scenario in terms of data sizes?

Run time complexity

In Phase II of the Algorithm description, each of the \(p\) processors quick sorts its data (of size \(\frac{n}{p}\) and then collects a regular sample of size \(p\). The time required for these operations is \(O(\frac{n}{p}\log(\frac{n}{p})+p)\).

In Phase III of the Algorithm description, the root processor merges the \(p\) sorted lists of size \(p\) into a sorted list of size \(p^{2}\), and then chooses \(p-1\) pivot elements. Using the multimerge algorithm, this merging process occurs \(p^{2}\log({p})\) times. The time required for these two operations is \(O(p^{2}\log({p})+(p-1))\).

In Phase IV of the Algorithm description, each of the \(p\) processors partitions its \(\frac{n}{p}\) sorted data points. This essentially is done via a sequential scan of the data and setting the values in the sbufferLength and sbufferStart arrays. The time complexity is \(\frac{n}{p}\).

In Phase V of the Algorithm description, the \(i^{th}\) processor merges together \(p\) sorted lists. Since each of these lists is created using our regularly sampled pivot values chosen in Phase III, ideally each of these \(p\) lists has size \(\frac{n}{p^{2}}\) which when after merger gives a list of size \(\frac{n}{p}\). Using the multimerge algorithm this requires \(\frac{n}{p}\log({p})\) time.

The total (parallel) processor time is \(O(\frac{n}{p}\log(\frac{n}{p})+p + \frac{n}{p} + \frac{n}{p}\log({p})) = O(\frac{n}{p}\left(\log(\frac{n}{p})+\log({p})+p+1\right)) = O(\frac{n}{p}(\log(n)+p+1)\).

If the original list is sorted sequentially using any comparison key sort then the time requirement is \(O(n\log(n))\). The speedup of this computation is

\(S(n,p) = \frac{\text{Sequential time}}{\text{parallel time}} = \frac{n\log(n)}{\frac{n}{p}\left(\log(\frac{n}{p})+\log({p})+p+1\right)} = \frac{p\log(n)}{\log(n)+p+1}\). If the size of the array is significantly larger than the number of processors the speed up is essentially \(p\).

Questions

  1. How does this run time complexity compare to the run time complexity of using quicksort on all the data in a sequential run?
  2. Does the time complexity of quicksort also make a size assumption of the data?

Space complexity

In the root processor an array of size \(n\) is required. In each of the non-root processors arrays of size \(\frac{n}{p}\) are required. Each processor requires a a couple buffers of size \(p\).

Questions

  1. Compare the space required by this algorithm to that required by quicksort.

Communication complexity

In Phase II of the Algorithm description all \(n\) data items are scattered.

In Phase III of the Algorithm description the root processor reads \(p^{2}\) values.

In Phase IV of the Algorithm description the root processor broadcasts \(p-1\) values.

In Phase V of the Algorithm description, each of the \(p\) processors sends out \(\frac{n}{p}-\frac{n}{p^{2}}\) data points, for a total of \(n-\frac{n}{p}\) data transfers.

In Phase VI of the Algorithm description, the root processor collect \(n-\frac{n}{p}\) values.

The total number of values communicated is \(n+p^{2}+p-1+n-\frac{n}{p}+n-\frac{n}{p} = 2n(1-\frac{1}{p})+p^{2}+p-1\).

Questions

  1. Asymptotic complexities have constants associated with them. What can you say about the constants associated with the run time and communication time complexities?