Computational Physics Lab 12

MPI Performance Study

The goal of this lab is to practice parallel programming with MPI and to develop some intuition about code speedups which can be achieved by running a parallel version of an algorithm.


1) Write a parallel version of the 1-d wave equation solver (you have already developed a serial version in Lab 6). To simplify things, assume that the number of points in the spatial grid is known in advance (it can be made a compile-time constant), and that the string ends are fixed. Also assume that the initial conditions can be generated by a function, so that you can just call this function on each node at the beginning of your program.

If you have N computing nodes executing the program, split the string into N equal sections and make each node responsible for propagating the string displacement in its own section. Each node has enough information to process its section with the exception of the leftmost and the rightmost points. The current value of the string displacement to the left of the leftmost point should be received from the node processing the neighbor section on the left. Similarly, to propagate the rightmost point, one needs to gather the information from the node processing the section to the right. Sections 0 and N-1 are special: the nodes processing them will have to talk to one neighbor only.

Since the information is exchanged between the neighbor nodes, you might find the MPI_Sendrecv command useful. This command can replace a sequence in which a node sends some data to another node and then receives data from the same node.

After processing a number of time steps (choose the number of steps so that your program runs for at least a few seconds), make all guest nodes send their results to the host node and write the displacement of the whole string into a file. Visualize the solution. Use your Lab 6 code to make sure that your parallel code produces the correct result. This, in fact, is a very common way to develop parallel programs -- serial code is written first, and then parallelized while making sure that the results remain the same for various test cases.


2) Build a speedup plot of your program. This is the graph of computation time divided by the time for one processor, versus the number of processors. The program timing can be obtained using the MPI_Wtime function. Do this for several different numbers of spatial points in the grid: 104, 105, 106. Using the Amdahl's law, estimate the parallel fraction of your program.


Please send in your lab report by email before 2 pm 04/29/2008. Include the program code, the speedup plot, and your conclusions.