Cholesky Example
The algorithm
The cholesky factorization is a matrix operation commonly used to solve normal equations in linear least squares problems. It mainly calculates a triangular matrix (L) from a symetric and positive definite matrix A. The product of this triangular matrix L and its transposed copy is A:

Note this description is just a simple overview to introduce the user to this algorithm. For more details about the algorithm this link may be useful.
The source code for SMPSs
In each iteration only red and blue parts are updated. The bloq_upd tasks are syrk and gemm, which update the red and blue parts in this order. The following figure graphically describes how the different primitives operate upon the matrix blocks.

The following code is the main algorithm of a blocked Cholesky factorization. The matrix A is organized in blocks of NB x NB floats, with a total of DIM x DIM blocks. The annotated application primitives (tasks) operate on these blocks.

Each function performs a block operation which can be annotated in order to be executed in a CPU as tasks. These tasks are calls to the CBLAS and LAPAK libraries.

Some Results
- Scalability

- Performance

Test Machine
The machine used for these tests has 2 power5 processors at 1.5 GHz. The power5 has 2 cores and each core has 2 FPU (Floating Point Units), the theoretical peak for each core is 6 GFlops. Then, the peak of the machine is up to 24 GFlops. In addition, each core is SMT so the best performance reached using two threads in the core.
Downloads
Cholesky example source files




