Parallel sorting by regular sampling: algorithm

The psrs sorting algorithm consists of six distinct phases. It also uses message passing communication paradigms of send, receive, broadcast, scatter and gather:

  • send allows one processor to send a message to another processor,

  • receive allows one processor to receive a message sent from another processor,

  • broadcast allows one processor to send a message to all processors,

  • scatter allows one processor to distribute pieces of data (say, an array) to all processors, and

  • gather allows one processor to collect pieces of data (say an array) from

    all processors.

message passing paradigms

Questions

  1. How could broadcast be made more efficient?
  2. What might you suspect is the difference between scatter and scatterv?

Algorithm description

This is a phase oriented description of the PSRS algorithm. This description also specifies the communications paradigms. Notice that partition, quicksort and multimerge are used in this description as well.

Phase I: Initialization

Start up \(p\) processors, let the root processor, 0, get data of size \(n\).

Phase II: Scatter data, local sort and regular samples collected

Scatter the data values to the \(p\) processors. Each processors sorts its local data set, roughly of size \(\frac{n}{p}\), using quicksort. Each processor chooses \(p\) sample points, in a very regular manner, from its locally sorted data.

Phase III: Gather and merge samples, choose and broadcast p-1 pivots

The root processor, 0, gathers the \(p\) sets of \(p\) sample points. It is important to realize that each set of these \(p\) sample points is sorted. These \(p\) sets are sorted using multimerge. From these \(p^2\) sorted points, \(p-1\) pivot values are regularly chosen and are broadcast to the other \(p-1\) processors.

Phase IV: Local data is partitioned

Each of the \(p\) processors partitions its local sorted data, roughly of size \(\frac{n}{p}\), into \(p\) classes using the \(p-1\) pivot values.

Phase V: All *ith* classes are gathered and merged

Processor i gathers the \(i^{th}\) class of data from every other processor. Each of these classes is sorted using multimerge.

Phase VI: Root processor collects all the data

The root processor gathers all the data and assembles the sorted list of \(n\) values.

Questions

  1. Manually, sort a list of \(20\) integers, with \(4\) simulated processors, following these six phases.
  2. What can you say about the time complexity of each phase of this algorithm? What does this tell you about the time complexity of the algorithm?
  3. What can you say about the space requirements of each phase of this algorithm? How does this affect the time complexity?
  4. What can you say about the communication complexity of this algorithm?

An example, data, communication and computation flow

The following figure illustrates this algorithm using \(1000\) data points and \(4\) processors. The communication between processors is annotated with the MPI functions Broadcast, Gather, Gatherv, and Scatter. As you study this figure, please realize that, in particular, Gatherv is rather complicated. The details of all these MPI functions will be discussed as part of the next section. For now, relate the six phases of the algorithm to the data, communications and computing flows indicated in the figure.

Picture of area under curve

Flow of data and computation using \(4\) processors and \(1000\) values to be sorted using the Parallel Sorting with Regular Selection Algorithm.

Exercises

  1. Identify each of the \(6\) phases of the algorithm in the figure.
  2. The two sets of arrays classStart[] and classLength[], and recvStarts[] and recvLengths[] are used as indices into mydata[] and recvbuffer[] on the sending processors and receiving processors in the Gatherv, respectively. The details of Gatherv will be discussed in the next section, but make sure you understand the scope of these arrays with respect to the different processors.