Matrix Multiplication Example
The algorithm and SMPSs code
Below we show an example of coding for SMP Superscalar. The example is a hyper matrix multiplication as in the Cell Superscalar example. The functions submitted to the CPUs are block matrix multiplications. As can be seen, a simple annotation before the declaration of the function is enough to allow this behavior with SMP Superscalar. Following the link below you can download the whole file for this example. A simpe vectorized version of the matmul can be downloaded at the bottom of the page.

Some Results
- Scalability
The following graph shows a comparison of the scalability of the algorithm ran with different matrix sizes. The tests have been run with blocked matrixes of 'Matrix Dimension' with blocks of 128x128. The kernels used for the block operations are the GOTO library ones called by CBLAS

- Performance
This graph shows the relation between the algorithm and the machine's peak when increasing the number of threads used. As can be seen, the peak of the machine grows from 1 thread to four, this is because the machine has four 6GFlops performance cores (see Test machine section).

Test Machine
The machine used for these tests has 2 power5 processors at 1.5 GHz. The power5 has 2 cores and each core has 2 FPU (Floating Point Units), the theoretical peak for each core is 6 GFlops. Then, the peak of the machine is up to 24 GFlops. In addition, each core is SMT so the best performance reached using two threads in the core.
Downloads
Matrix Multiplication example source files.




