On efficient finite element assembly in Matlab

Submitted by Alejandro Orti… on Mon, 02/08/2010 - 20:31

As we all know, finite element matrices are sparse and many memory can be saved if their sparsity is exploited. Due to the latter, we are often provoked to declare the nxn stiffness matrix as a sparse one just before the element loop. Be careful of this method since the larger the matrix the slower the assembly operations. There is a better and more efficient way to code the assembly procedure in Matlab. I will show how to accomplish this. For the sake of simplicity, the stiffness matrix of the two-dimensional heat conduction problem on the domain (0,1)x(0,1) is considered. A uniform mesh of mxm triangles is used, where m is the number of subdivisions per side. Three assembly methods are tested. The computations are done with increasing number of subdivisions. The complete Matlab code that I used for the tests can be downloaded from my webpage.

Method 1 (assembly1.m)

K(i,j) = K(i,j) + number,

with K a sparse matrix declared before the loop over finite elements. This method is the one that many of us would normally use to perform the assembly in a finite element code. However, it turns out that this method is the slowest but it can save significant amount of memory.

Method 2 (assembly2.m)

K(i,j) = K(i,j) + number,

with K a full matrix declared before the loop over finite elements. This scheme is faster than method 1. However, it will rapidly run out of memory. Because of the later, its applicability is limited to matrices of modest size.

Method 3 (assembly3.m)

On using three vector arrays I, J, X such that K( I(a) , J(a) ) = X(a), the sparse matrix K of size nxn is built as

K=sparse(I,J,X,n,n).

The X values of repeated pairs in I,J are added each other in Matlab sparse function, which represents the assembly procedure. This method can save as much memory as method 1 but it is significantly faster. The allocation (with Matlab zeros function) of I, J and X is done before the finite element loop. Note that the size of these arrays is known a priori. For instance, consider the case of the heat conduction problem. Each triangle contributes with 9 entries to the stiffness matrix. Hence, the size of I, J and X is 9 times the number of elements. If 100x100 subdivisions are specified, that is 100x100x2 triangle elements, the size of I, J or X is 180000 numbers. The total size that one needs to allocate for is then 540000 numbers, which represents about 1/192 of the amount needed to allocate K in method 2. After the construction of K with sparse(I,J,X,n,n), the arrays I, J and X can be deleted as they are not further needed, which results in additional memory saving.

The tests

In the tests, the following machines are used:

Machine 1: Intel Core2 Duo CPU T7500 @ 2.20 GHz, 789 MHz, Windows XP (32-bit).

Machine 2: Intel Xeon CPU X3363 @ 2.83 GHz, 2833.342 MHz, Red Hat (64-bit)

Test 1: Square domain (0,1)x(0,1) with 100x100 subdivisions. The results are the following:

Machine 1:

Method 1 = 13.7357 s

Method 2 = 8.3654 s

Method 3 = 3.3775 s

Machine 2:

Method 1 = 2.6798 s

Method 2 = 1.9539 s

Method 3 = 0.8609 s

Test 2: Square domain (0,1)x(0,1) with 200x200 subdivisions. The results are the following:

Machine 1:

Method 1 = 145.4049 s

Method 2 = out of memory

Method 3 = 13.1721 s

Machine 2:

Method 1 = 22.8242 s

Method 2 = 14.9338 s

Method 3 = 3.4414 s

Test 3: Square domain (0,1)x(0,1) with 500x500 subdivisions. The results are the following:

Machine 1:

Method 1 = not tested

Method 2 = not tested

Method 3 = not tested

Machine 2:

Method 1 = 603.6082 s

Method 2 = out of memory

Method 3 = 21.4391 s

Test 4: Square domain (0,1)x(0,1) with 800x800 subdivisions. The results are the following:

Machine 1:

Method 1 = not tested

Method 2 = not tested

Method 3 = not tested

Machine 2:

Method 1 = 3805.9 s

Method 2 = out of memory

Method 3 = 54.6115 s

Test 5: Square domain (0,1)x(0,1) with 1500x1500 subdivisions. The results are the following:

Machine 1:

Method 1 = not tested

Method 2 = not tested

Method 3 = not tested

Machine 2:

Method 1 = 186260 s

Method 2 = out of memory

Method 3 = 190.6444 s

Conclusions

The assembly computations can significantly be improved if Matlab sparse function is used appropriately. Method 1 is not suitable since in the finite element loop there are as many insertion operations as the number of K(i,j) evaluations. In other words, if a new non-zero entry is to be introduced in the sparse structure, then Matlab has to make room for it by inserting the value into the structure. An insertion operation into an array is proportional to the number of data in the array, in this case, the number of non-zero (nnz) entries in K. Therefore, the loop over K(i,j) takes time proportional to square(nnz). On the other hand, method 2 is preferable over method 1, but it has limited applicability since the system will run out of memory sooner than later. Finally, as we have seen from the tests, method 3 is by far the more efficient way to perform the assembly of finite element matrices with faster computations. I highly recommend using method 3 and avoid by all means method 1.

sunil.raiyani

Realy good

its very usefull in my project to reduce my time thanks a lot.....

Tue, 02/09/2010 - 18:25 Permalink

Alejandro Orti…

Test 5 has been added... have a look at the big time difference

Test 5 has been added... have a look at the big time difference.

Wed, 02/10/2010 - 05:48 Permalink

Ettore Barbieri

Avoiding "for" loops for the assembly of the stiffness matrix

Dear Alejandro,Hope you're okay, long time no see!

This is an excellent trick, since the command "sparse" automatically sums over repeated indices, therefore there is a great saving in time by calling "sparse" at the "end" of the for-loop and not "inside" the for-loop.

However, more time could be saved avoiding the "for-loops”, which in Matlab are really time-consuming.

In the recent releases of Matlab there are a bunch of new commands such as "bsxfun" or"arrayfun" or "cellfun" that are basically built-in for-loops that are executed faster.

These commands can be used whenever vectorization cannot be done.

For example, in your test, there are loops over the elements, over the Gaussian points and, nested, double loops over the nodes of the single element.

I know that your example is didactic, but, if you allow me, I'd like to further the exercise to show how loops can be avoided, maybe it could be beneficial to the imechanica friends.

Here is what I would do: first of all, Gaussian points are not really necessary for triangular linear elements, since the generic stiffness matrix K(i,j) can be explicitly calculated (in other words it would suffice one Gaussian point located in its centroid with weight equal to the area of the triangle).

In fact in your function "shapefunctionsT3" Gaussian points are needed only for the shape functions, not for their derivatives, which are the quantities needed for the stiffness matrix.

But I understand that one might want Gaussian points for the sake of generality, therefore I'll stick to the use of Gaussian points.

Say you have Gaussian points over each element (the external for-loop over the element, which I'll treat later), you can avoid loop over the Gaussian points and nodes by simply calling for each element

phiderx*diag(w)*phiderx + phidery*diag(w)*phidery

where phiderx and phidery are matrices (g x 3) where the rows contain the values in the Gaussian points and the columns span over the nodes of the element.

By doing so, you obtain the elemental stiffness matrix 3x3 with only one command and no loops over the Gaussian points and the nodes.

Anyway, there is no need of Gaussian points for the derivatives; therefore I would calculate the derivatives beforehand using the isoparametric mapping

function obj = Jacobian(obj)

F22 = obj.Nodes(obj.Surface(:,2),2) - obj.Nodes(obj.Surface(:,3),2); % y2-y3

F21 = obj.Nodes(obj.Surface(:,2),1) - obj.Nodes(obj.Surface(:,3),1); % x2-x3

F12 = obj.Nodes(obj.Surface(:,1),2) - obj.Nodes(obj.Surface(:,3),2); % y1-y3

F11 = obj.Nodes(obj.Surface(:,1),1) - obj.Nodes(obj.Surface(:,3),1); % x1-x3

obj.detF = F11.*F22-F12.*F21;

M11 = F22./obj.detF;

M12 = -F12./obj.detF;

M21 = -F21./obj.detF;

M22 = F11./obj.detF;

obj.DPHIx = M11*[1 0 -1] + M12*[0 1 -1]; % diff(PHI,xi) = [1;0;-1]

obj.DPHIy = M21*[1 0 -1] + M22*[0 1 -1]; % diff(PHI,eta) = [0;1;-1]

end

Where I have encapsulated nodes and surfaces in the structure (object) "obj".

In this way, there is no need of looping over the elements to construct the derivatives of the shape functions, since it can be done using vectorization.

DPHIx and DPHIy are in this case (ne x 3) matrices where "ne" is the number of the elements.

Once obtained the stiffness matrix 3x3, you can reshape it "column-wise" and construct the indices I and J by using the command

[I,J] = ind2sub([ns ns],(1:ns^2).');

where ns is the number of nodes for each elements, three in this case.

Moreover, you can define “function handles" that are called for each element

Kxx = @(e) 1/2*obj.DPHIx(e,:).'*obj.DPHIx(e,:).*obj.detF(e);

Kyy = @(e) 1/2*obj.DPHIy(e,:).'*obj.DPHIy(e,:).*obj.detF(e);

Ke = @(e) Kxx(e) + Kyy(e);

and, for example, the call

Kxx(3) gives the 3x3 elemental stiffness matrix for the third element.

Now, we are ready for the loop over the elements that will be operated by using “arrayfun”.

The indices from the elemental stiffness matrix to the global stiffness matrix can be constructed as

i11G = @(e) obj.Surface(e,I).';

j11G = @(e) obj.Surface(e,J).';

and

ig = arrayfun(i11G,(1:ne).','Uni',0);

js = arrayfun(j11G,(1:ne).','Uni',0);

s = arrayfun(Ke,(1:ne).','Uni',0);

ig = cell2mat(ig); js = cell2mat(js); s = cell2mat(s);

obj.K = sparse(ig,js,s,n,n);

where n is the total number of nodes.

There might be a bug somewhere, since in this exercise I've adapted my linear elastic FE code to the heat conduction problem! :) ; Nonetheless, the programming logic is the same.

Thu, 05/06/2010 - 15:18 Permalink