From: MPI Developers Conference and Users' Group Meeting. Proceedings, Second MPI Developer's Conference, pages 142-148. University of Notre Dame, Notre Dame, Indiana, 1-2 July 1996.
Cone beam tomography is a computed tomography (CT) method which efficiently acquires projectional data through a single rotation. Using a 2D detector, a 3D volume can be reconstructed with isotropic voxel sizes without the mechanical translation required for conventional CT. Our current application area involves 3D CT with microscopic resolution .
With cone beam CT long computation and a large amount of memory is required. Reconstruction times of several hours have been required on conventional workstation. Large voxel arrays are used, typically 2563 to 5123. A pseudocoding of the cone beam algorithm is given by:
1 Initialize each PE 2 Read and Broadcast problem specifications 3 Partition and allocate memory 4 Allocate memory 5 Precomputation of weights 6 for each (N views): 7 if (PE is ROOT) 8 Read Projection 9 Weight and Filter Projection 10 Broadcast Projection 11 Backproject Projection 12 Gather Reconstructed Voxels on ROOT PEThe majority of the computation is in the backprojection (11). The majority of the communications is spent in sending the projection (10). We have investigated the use of parallel methods on workstation clusters to overcome both the long processing time and lessen memory requirements on individual workstations.
Because of the large amounts of data involved in this problem, it was our interest to use an explicit message passing library for the implementation to achieve maximum performance. Our earlier results showed 81.8% processor utilization on a heterogeneous cluster of 6 Sun workstations . This high degree of utilization was achieved by load balancing in the backprojection (11) and in the filtering (9) steps.
In this work, we demonstrate improved performance using asynchronous communications to broadcast the projection (10) using MPICH.
The reconstruction problem is very suitable for asynchronous communication because of the repeated deterministic nature of the views in the problem. By using asynchronous communication to send projectional data while backprojecting the previous projection. the communications overhead can be eliminated on the receiving processors as long as the backprojection time is greater than or equal to the communication time.
We replaced the synchronous broadcast (MPI_Bcast) with repeated asynchronous send/receive pairs (MPI_Isend/MPI_Irecv) and with double buffering of the projection to assure correctness. For the large projection sizes, the MPI_Isend became non-asynchronous. To achieve asynchronous communication, a repeated MPI_Test during backprojection on the receiving processors was needed to complete the send.
Use of asynchronous broadcast in the cone beam CT problem has improved utilization from 81.8% to 91.9% on a cluster of 6 heterogeneous workstations.
Asynchronous communication was very useful for this problem. Further effort is needed in addressing the interplay between workstation performances and communication speeds in a heterogeneous environment.
In addition to asynchronous communication, another possibility is a multiphase broadcast. Since many networks now isolate traffic into physical subnetworks with bridges and routers a broadcast method using forwarding can be employed. In this method, the master processes broadcast to a head slave in each subnet which then forwards to each workstation on the subnet. This can be easily implemented using communicator groups. A disadvantage of this method requires apriori knowledge of the network topology. However, this topology is likely to change slowly and could be maintained in a configuration file with processor speed and availability. A multiphase broadcast could be done both synchronously and asynchronously.