Let \(p\) and \(n\) represent the number of processors and the number of data points to be used in the Parallel sorting by regular sampling algorithm. There are a number of complexities to be considered: run time, space used, and communication. The *run time* complexity should indicate the sequential time complexity, taking into account parallel computations. (The *run time* is not to be confused with the *work* which is the sum of the individual time complexities.) The *space* used is the total amount of space required across all processors to complete the computation. The *communication* time is the time required for the processors to move all messages during the computation. The *communication* complexity is a function of the network architecture. For this discussion, it is assumed that the underlying architecture is a single bus ethernet. Another assumption in this analysis is that all data sizes are basically uniform.

- What is meant by
*all data sizes are basically uniform*? - How reasonable is this assumption?
- What is a worst case scenario in terms of data sizes?

In Phase II of the *Algorithm description*, each of the \(p\) processors quick sorts its data (of size \(\frac{n}{p}\) and then collects a regular sample of size \(p\). The time required for these operations is \(O(\frac{n}{p}\log(\frac{n}{p})+p)\).

In Phase III of the *Algorithm description*, the root processor merges the \(p\) sorted lists of size \(p\) into a sorted list of size \(p^{2}\), and then chooses \(p-1\) pivot elements. Using the multimerge algorithm, this merging process occurs \(p^{2}\log({p})\) times. The time required for these two operations is \(O(p^{2}\log({p})+(p-1))\).

In Phase IV of the *Algorithm description*, each of the \(p\) processors partitions its \(\frac{n}{p}\) sorted data points. This essentially is done via a sequential scan of the data and setting the values in the *sbufferLength* and *sbufferStart* arrays. The time complexity is \(\frac{n}{p}\).

In Phase V of the *Algorithm description*, the \(i^{th}\) processor merges together \(p\) sorted lists. Since each of these lists is created using our regularly sampled pivot values chosen in Phase III, ideally each of these \(p\) lists has size
\(\frac{n}{p^{2}}\) which when after merger gives a list of size \(\frac{n}{p}\). Using the multimerge algorithm this requires \(\frac{n}{p}\log({p})\) time.

The total (parallel) processor time is \(O(\frac{n}{p}\log(\frac{n}{p})+p + \frac{n}{p} + \frac{n}{p}\log({p})) = O(\frac{n}{p}\left(\log(\frac{n}{p})+\log({p})+p+1\right)) = O(\frac{n}{p}(\log(n)+p+1)\).

If the original list is sorted sequentially using any comparison key sort then the time requirement is \(O(n\log(n))\). The speedup of this computation is

\(S(n,p) = \frac{\text{Sequential time}}{\text{parallel time}} = \frac{n\log(n)}{\frac{n}{p}\left(\log(\frac{n}{p})+\log({p})+p+1\right)} = \frac{p\log(n)}{\log(n)+p+1}\). If the size of the array is significantly larger than the number of processors the speed up is essentially \(p\).

- How does this run time complexity compare to the run time complexity of using
*quicksort*on all the data in a sequential run? - Does the time complexity of
*quicksort*also make a size assumption of the data?

In the root processor an array of size \(n\) is required. In each of the non-root processors arrays of size \(\frac{n}{p}\) are required. Each processor requires a a couple buffers of size \(p\).

- Compare the space required by this algorithm to that required by
*quicksort*.

In Phase II of the *Algorithm description* all \(n\) data items are scattered.

In Phase III of the *Algorithm description* the root processor reads \(p^{2}\) values.

In Phase IV of the *Algorithm description* the root processor broadcasts \(p-1\) values.

In Phase V of the *Algorithm description*, each of the \(p\) processors sends out \(\frac{n}{p}-\frac{n}{p^{2}}\) data points, for a total of \(n-\frac{n}{p}\) data transfers.

In Phase VI of the *Algorithm description*, the root processor collect \(n-\frac{n}{p}\) values.

The total number of values communicated is \(n+p^{2}+p-1+n-\frac{n}{p}+n-\frac{n}{p} = 2n(1-\frac{1}{p})+p^{2}+p-1\).

- Asymptotic complexities have
*constants*associated with them. What can you say about the constants associated with the run time and communication time complexities?