Release notes for TBCI NumLib 2.5.5 ----------------------------------- Version 2.5.5 has a number of fixes and enhancements over 2.5.4. These are: * Change the tarball to use .tar.bz2 instead of .tar.gz (tgz). Assign a version number there, also creat tarballs/zip files with autogenerated guiding decls / instantiation lists. * TBCI::Matrix defined value_type to be aligned on a 16byte boundary. This could break (in svd_solver) on a 32bit machine. Took this as opportunity to clean up the typdefs: - value_type is now consistently just the base type of a compound class (element_type aliasing value_type) - aligned_value_type may have additional alignment imposed for increased performance * libtbcilapack did explicitly depend on libtbcidouble and implicitly as well on libtbcicplxdouble and libtbcistdcomplexdouble. These dependencies have been removed by adding a few conditional inline statements to BVector and F_TMatrix. libtbcilapack can now be used without the other tbci libraries. Note that if you previously relied on libtbcilapack pulling in libtbcidouble at link time, you might need to change your Makefiles. * In Matrix Vector multiplication, the TSMatrix variant had a bug where the result Vector would not have the right size for non-square matrices. (Likewise for F_TSMatrix.) * memalloc_cache was not thread-safe; TBCI threads were thus not supposed to do memory (de)allocations. The ones in TBCI didn't, but third parties may run into this trap. memalloc.h does now make sure only the main thread uses caching memory allocations. (This requires your platform to support thread local variables -- otherwise the use of threads will switch off memalloc_cache alltogether.) The downside of this is that for SMP=1 compiles, malloc_cache now has a link time dependency to smp. You can opt for using spinlocks instead by defining TBCI_MALLOC_FORCESPINLOCK. I had expected this to yield better performance, but to my surprise this is not the case in general. * Linux kernel 2.6.16 and newer report the total process CPU time when calling clock() from a thread (as they should following POSIX thread definition). Use clock_gettime() to get per thread CPU times if compiling with THREAD_STAT=1. This will need linking against librt. * Minor fixes here and there (C_MEMALLOC, lapack's inv() routine ...) * The specfile now does invoke a few regression tests to ensure the build's integrity. * The special functions that have been converted with f2c a long time ago did not use prototypes consistently. This lead to a mismatch between integer and long types on 64bit platforms that went unnoticed by the compiler. Fixed that and made sure we use prototypes at least for the exported routines. * The following fixes have been triggered by tests from Florian Hudelist, for which I'm very grateful. * A new test program for calculating the eigenvalue in lina/test/test_lapack_eig.cpp. It needs lapack, blas and the right fortran libs to be installed (libgfortran and libf2c or libg77 and libg2c). * F_Matrix: This was implementation has obviously not seen much testing before. A few test programs exposed this and resulted in a code review that turned up a number of things. There was confusion of rows vs. coumns at a few places.