# Any tips/comments regarding the latest version of the C++ library: Eigen (v. 3.0)?

Hi all,

**1. A new version of Eigen (v 3.0 now) is out** (on March 23, 2011), and it seems promising. First, a few links:

The main page for the project is here: [^]. The page for v.3.0 is here: [^]. It seems to be *very* fast: [^].

**2. No compilation issues on the Win32 platform:** I just downloaded it today, and found that it compiles OK with VC++ 10.0 on Win32 (the BUILD_ALL project). I haven't had the time to try anything further, though.

**3. Please share your experience:** For whatever reasons, if anyone here eventually decides to use it, or not to use it, or has any comments or tips to offer, I would like to hear from him/them, esp. in respect of the following points:

**3.1** **For eigenvalue computations**, comparative advantages/disadvantages, including speed comparisons, with the existing libraries like ARPACK, Trilinios, etc.

Assume that the system to solve has origins in a typical structural/solid mechanics FEM. If possible, please share your experience for global [A] matrix orders of 3k, 10k, 100k, and bigger, the matrices being the typical FEM ones: sparse, symmetric, etc.

[This note is not really necessary, but just in the interest of clarity: If a matrix has 10 rows and 10 columns, then its order is taken to be 10, not 100.]

**3.2 For solving a linear FEM-generated system** using a direct solver, speed comparisons with the existing FORTRAN/C++ code as implemented in GOTO (latest version), Taucs, etc. Again, matrix orders go from 3k to 100k and bigger, and the matrices, of course, are sparse, symmetric, etc.

**4.** Any **other tips** you care to share would also be appreciated.

Thanks in advance.

--Ajit

[E&OE]

- Ajit R. Jadhav's blog
- Login or register to post comments
- 2808 reads

## Dear Ajit, To me the

Dear Ajit,

To me the added value of eigen is to allow automatic tuning of operations on dense matrices and vectors. So from that point of view it is a very promising and exciting code. As far as I know, the direct sparse solver of Eigen is not very state-of-the-art its a copy of the CSparse code of Tim Davis. I asked to the eigen team if they were considering to implement a left-looking or multifrontal direct solver that could take advantage of blas3 operation (that they optimize), buyt unfortunately they won't (at least in a near future). So to have an efficient direct solver you should still relies on other classical blas.

My blog on research on Hybrid Solvers: http://mechenjoy.blogspot.com/

## Eigen 3.0 and solver libraries---Reply to Thomas

Dear Thomas,

Thank you very much for the clarification. ... So, I gather that since Eigen 3.0 isn't going to be multifrontal for a while, other classical blas would be better... Hmmm...

I am pretty much a novice to these libraries, and very much in the process of learning about them. So, could you please clarify or point out links for what is meant by "left-looking"?

Another point. I would like to have a little private communication with you concerning our current requirements and the capabilities of various solver libraries (both from public and commercial domains). If you don't mind, could you please drop me an email at aj175tp [at] yahoo [do t] co [d o t] in ? Thanks in advance.

--Ajit

- - - - -

[E&OE]

## Direct solvers

Dear Ajit,

In fact concerning the (let say) cholesky decomposition there is 6 different manners to write the algorithm, depending if you start to work with rows or columns (basically there are 3 "for" loops). The right or left looking version are two of those. This naming comes from the fact in the right looking version (which is actually row by row) you update terms of your factorization that are after the diagonal (on the right), while the left looking (which is col by col and is the actual implementation of CSparse) you do it on the left. May be I' m wrong (I often confuse the dfferent factorizations) but the idea is close to that.

Depending on how you write the factorization algorithm (meaning: which update of the term you choose, or the order of the FOR loops) algorithms have different properties specially on how and when should one update the new computed values of the factorization. Some implementation (they are called supernodal) allow to gather some nodes and do the update work in a block fashion (from time to time, not on all the matrix though). This allows to use dense matrix algorithmic (such as dense matrices product etc) which are highly optimized by BLAS implementation. Even if those optimization impacts only a fraction of the complete factorization, the speedup may be impressive. Multifrontal solvers also allow to use the same trick, but with an other idea.

A good reference (but I find it require a lot of investment to be fully understood) is the book of Tim Davis http://www.amazon.com/Direct-Methods-Systems-Fundamentals-Algorithms/dp/...

As you probably know, the performance of blas, up to now, was dependent of the hardware architecture (cache property, pipelining etc) , since the libraries were using assembler to get full speedup. The automatic tuning of ATLAS changed this quite a bit, since it could get almost optimal performance on almost any machine. However the main part of eigen is devoted to create a comparable code but with SSE3 instructions which is 1) higher in the code hierarchy 2) more generic. So eigen is able to create optimized blas on any hardware supporting SSE3 instructions. that's great, but limited to dense algorihms.

On the other hand, I know they are developping some algorithmic for sparse matrices, but as far as I know those algorithm won't benefit much of the speedup of the dense algorithms. However I checked that one year ago, so I should check it once again before being fully affirmative on that point.

Best Regards,

Thomas

## Further on solvers

Dear Thomas,

0. Thanks, once again, for a wonderfully detailed reply.

1. About books:It was only after writing my request about "left-looking" etc. that it occured to me to look up in Numerical Recipes. (I have the 2/e, in C++.) I then realized how much I have to learn. ... I will complete NR and then try to go through Davis' book. Your comments will also come in handy in this learning process, I am sure.2. The absence of comparative data:However, there is another thing I wish to note, and it is: the absence of comparative performance data for large systems, and on the common (PC) platform. Allow me to explain.A while ago, we had a discussion at iMechanica concerning FORTRAN vs. C++, and someone mentioned that the default FORTRAN code would be faster than the default C++ one. (I tried searching for that thread, but in vain. The thread I have in mind is not this one [^] or this one [^].) With expression templates, even C++ code should be fast. But how fast(er)? One would like to have some definitive data for matrices of orders 10K+, but such data are surprisingly difficult to find.

Another thing.

Allsolvers runveryslowly as the system size grows large (matrix order > 10 k, say, per processor). The subtle differences in the algorithms will surely count. But by how much---for the large systems? By what percentage or factor? I have no idea, andnonebacks up the descriptions of algorithms with quantitative data....The fact of the matter is, if very standard algorithms (such as those given in, say, NR) are not going to be slower by more than 50%, then, frankly, one wouldn't care for any advanced library. (The emphasis, again, is for large systems.) Similarly, if using a dense matrix algorithm is going to be permissible (given the routine 2GB and 4GB RAMs in ordinary desktops today), then, again, one wouldn't care going specifically for the sparse matrix algorithms.

But there is no way to get such quantitative data easily. Typically, what one runs into is something like this: "Ours is the greatest library since Backus wrote the rule-book for FORTRAN; we even use MPI." Ok, but what about the simpler shared-memory support via, say, OpenMP? Blank-out. What about GPGPU support? Blank-out. What about eigenvalue computations in addition to the linear system solution? Blank-out. What about easy compilation on the Windows platform using VC++ (and not MinGW etc.)? Mostly blank-out.

Today, the fact is, in order to evaluate a library (say for a commercial application in my day-job), I have to myself download it, compile it, run test-data, and come to my own conclusions. Which leads me to the next point.

3. Internet Depository for Performace Data of Solver Libraries:Can't we have an Internet depository of sorts, on the lines of UFlorida's matrix collection, where people can go and report the sort of performance data they get with different hardware and software combinations?4. What I am going to do:Anyway, enough by way of a rant. I think over the next couple of months (or more), I will myself do something towards this idea. I will first write the dumbest (i.e. the most simple-minded) algorithms (e.g. for direct solution, the Numerical Analysis 101 type of an implementation of the Gaussian elimination), in both FORTRAN and C++, and consider their data as the common baseline for comparison. Then, I will compile the NR book algorithms, as the second baseline. Then, I will begin compiling and testing some 8 to 10 commonly available libraries. I will upload my programs and results at some place suitable (perhaps at iMechanica).... I will surely do that, and while doing that, esp. while implementing the NR book algos, I will also begin learning about this fascinating topic of numerical algorithmics. As you can see from the above quoted past two threads at iMechanica, I have been looking around for a long enough time, and it's high-time I began wrapping it up for at least some intermediate conclusions. ... I think a dedicated blog for the comparative data would be a good idea... I will announce its creation as soon as I do it.

--Ajit

- - - - -

[E&OE]