I need to write a program, which will perform LU-decomposition, etc. The problem is that I don't know about the preferred way to distribute the loaded matrix from the root process to other processes. I'm able to create a simple algorithm for some situations, but I really need the solution which works for arbitrary number of processes.
I'd prefer some Coarse-grain distribution according to this presentation (slide 18).
Example
Here I can see, that we can use 1 process or 2 processes (3 or 4 would be "too fine-grain"). But, what's the correct way with 2 processes? Should the matrix be divided by rows or columns?
Example 2
In this case, it's even more problematic. I can't distribute values among 2 processes evenly. Even if I'd distributed it like this:
P1: 2 8 3 4
P2: 5 1 6 2 5
It completely throws the symmetrical distribution into disarray.
Then, with 3 processes it's exactly the same situation like in Example 1 - rows or columns?
So, I presume there is no way to do this for a completely arbitrary number of processes, because the point is in the even distribution. What's the approach for this?
I hope I've described my problem clearly enough. If not, leave a comment, please, and I'll improve my question.