Currently I have two implementations of matrix multiplication algorithms, the standard Divide and Conquer approach, and Strassen's Algorithm. With both of these approaches, I partition the matrices into their submatrices with a function that takes O(n^2) time. Although there are only a constant amount of partitions, it still creates a large amount of overhead. What type of changes to the algorithms would I have to make, to allow for constant time partitioning?

Thanks.

# Matrix Multiplication

Page 1 of 1## 2 Replies - 6950 Views - Last Post: 05 August 2012 - 11:31 PM

##
**Replies To:** Matrix Multiplication

### #2

## Re: Matrix Multiplication

Posted 05 August 2012 - 09:14 PM

Can you be a bit more explicit about "it still creates large amounts of overhead"?

I don't see the partitioning as particularly expensive if you make the matrices square matrices that are powers of 2. Each recursive level down is simply quartering the matrices.

I've only learned about Strassen's algorithm a few hours ago after you mentioned it. Never really learned about it in school. I can't really comment with in depth knowledge about the overhead for the operations needed to implement the algorithm.

If I were to implement it, though, as I was taking a quick skim over, I already noticed that M1-M7 don't need to co-exist all at the same time. So technically, I only pay the price of allocating space for M1 and keep recycling the space within that recursion level. Basically in pseudo code:

If I was being really cheap, I'd just allocate space for M to be as big as the largest quadrant, and just keep using the same space as a scratch pad for each recursion level.

For "divide and conquer", the extra overheard I'm seeing is for the intermediate results of the products:

I was thinking though, instead of the standard implementation where it would look like:

That this could be changed into:

Basically get rid of intermediate arrays M1 and M2 and put the values directly into the target array.

Edit after:

And I completely forgot what I was going to write in the first place before I got sidetrack with these partial space optimizations. When partitioning A and B, there is no real need to create brand new smaller matrices and copying values out of them. It is possible to just remember the top left corner and size of a "child" matrix to be used by other operations. Yes the code maybe ugly, but since you're primary concern is efficiency, you'll save a lot of CPU time not allocation space copying stuff around and deallocating it.

I don't see the partitioning as particularly expensive if you make the matrices square matrices that are powers of 2. Each recursive level down is simply quartering the matrices.

I've only learned about Strassen's algorithm a few hours ago after you mentioned it. Never really learned about it in school. I can't really comment with in depth knowledge about the overhead for the operations needed to implement the algorithm.

If I were to implement it, though, as I was taking a quick skim over, I already noticed that M1-M7 don't need to co-exist all at the same time. So technically, I only pay the price of allocating space for M1 and keep recycling the space within that recursion level. Basically in pseudo code:

Compute M1 and store into M. C11 += M C22 += M Compute M2 and store into M. C21 += M C22 -= M Compute M3 and store into M. C12 += M C22 += M Compute M4 and store into M. C11 += M C21 += M Compute M5 and store into M. C11 -= M C12 += M Compute M6 and store into M. C22 += M Compute M7 and store into M. C11 += M

If I was being really cheap, I'd just allocate space for M to be as big as the largest quadrant, and just keep using the same space as a scratch pad for each recursion level.

For "divide and conquer", the extra overheard I'm seeing is for the intermediate results of the products:

C11 = A11*B11 + A12*B21

I was thinking though, instead of the standard implementation where it would look like:

M1 = A11*B11 M2 = A12*B21 C11 = M1 + M2

That this could be changed into:

MultiplyAndAddSumInto(A11, B11, C11); : MultiplyAndAddSumInto(a[][], b[][], c[][]) { for(int i = 0; i < a.RowCounts; i++) for(int j = 0; j < b.ColumnCount; j++) for(int k = 0; k < a.ColumnCount; k++) c[i][j] += a[i][k] * b[k][j]; }

Basically get rid of intermediate arrays M1 and M2 and put the values directly into the target array.

Edit after:

And I completely forgot what I was going to write in the first place before I got sidetrack with these partial space optimizations. When partitioning A and B, there is no real need to create brand new smaller matrices and copying values out of them. It is possible to just remember the top left corner and size of a "child" matrix to be used by other operations. Yes the code maybe ugly, but since you're primary concern is efficiency, you'll save a lot of CPU time not allocation space copying stuff around and deallocating it.

This post has been edited by **Skydiver**: 05 August 2012 - 09:25 PM

### #3

## Re: Matrix Multiplication

Posted 05 August 2012 - 11:31 PM

Skydiver, on 05 August 2012 - 09:14 PM, said:

Can you be a bit more explicit about "it still creates large amounts of overhead"?

I don't see the partitioning as particularly expensive if you make the matrices square matrices that are powers of 2. Each recursive level down is simply quartering the matrices.

I've only learned about Strassen's algorithm a few hours ago after you mentioned it. Never really learned about it in school. I can't really comment with in depth knowledge about the overhead for the operations needed to implement the algorithm.

If I were to implement it, though, as I was taking a quick skim over, I already noticed that M1-M7 don't need to co-exist all at the same time. So technically, I only pay the price of allocating space for M1 and keep recycling the space within that recursion level. Basically in pseudo code:

If I was being really cheap, I'd just allocate space for M to be as big as the largest quadrant, and just keep using the same space as a scratch pad for each recursion level.

For "divide and conquer", the extra overheard I'm seeing is for the intermediate results of the products:

I was thinking though, instead of the standard implementation where it would look like:

That this could be changed into:

Basically get rid of intermediate arrays M1 and M2 and put the values directly into the target array.

Edit after:

And I completely forgot what I was going to write in the first place before I got sidetrack with these partial space optimizations. When partitioning A and B, there is no real need to create brand new smaller matrices and copying values out of them. It is possible to just remember the top left corner and size of a "child" matrix to be used by other operations. Yes the code maybe ugly, but since you're primary concern is efficiency, you'll save a lot of CPU time not allocation space copying stuff around and deallocating it.

I don't see the partitioning as particularly expensive if you make the matrices square matrices that are powers of 2. Each recursive level down is simply quartering the matrices.

I've only learned about Strassen's algorithm a few hours ago after you mentioned it. Never really learned about it in school. I can't really comment with in depth knowledge about the overhead for the operations needed to implement the algorithm.

If I were to implement it, though, as I was taking a quick skim over, I already noticed that M1-M7 don't need to co-exist all at the same time. So technically, I only pay the price of allocating space for M1 and keep recycling the space within that recursion level. Basically in pseudo code:

Compute M1 and store into M. C11 += M C22 += M Compute M2 and store into M. C21 += M C22 -= M Compute M3 and store into M. C12 += M C22 += M Compute M4 and store into M. C11 += M C21 += M Compute M5 and store into M. C11 -= M C12 += M Compute M6 and store into M. C22 += M Compute M7 and store into M. C11 += M

If I was being really cheap, I'd just allocate space for M to be as big as the largest quadrant, and just keep using the same space as a scratch pad for each recursion level.

For "divide and conquer", the extra overheard I'm seeing is for the intermediate results of the products:

C11 = A11*B11 + A12*B21

I was thinking though, instead of the standard implementation where it would look like:

M1 = A11*B11 M2 = A12*B21 C11 = M1 + M2

That this could be changed into:

MultiplyAndAddSumInto(A11, B11, C11); : MultiplyAndAddSumInto(a[][], b[][], c[][]) { for(int i = 0; i < a.RowCounts; i++) for(int j = 0; j < b.ColumnCount; j++) for(int k = 0; k < a.ColumnCount; k++) c[i][j] += a[i][k] * b[k][j]; }

Basically get rid of intermediate arrays M1 and M2 and put the values directly into the target array.

Edit after:

And I completely forgot what I was going to write in the first place before I got sidetrack with these partial space optimizations. When partitioning A and B, there is no real need to create brand new smaller matrices and copying values out of them. It is possible to just remember the top left corner and size of a "child" matrix to be used by other operations. Yes the code maybe ugly, but since you're primary concern is efficiency, you'll save a lot of CPU time not allocation space copying stuff around and deallocating it.

I wanted to do what you said in your edit, namely using indices and the sizes, but I'm not sure how I would actually do it. I'm just going to try and work this out by hand, thanks for the input!

Page 1 of 1