LinAlg: Basic Linear Algebra and Optimization classlib
LinAlg: Basic Linear Algebra and Optimization classlib
This C++ class library introduces
the real domain. The library contains efficient and fool-proof
implementations of level 1 and 2 BLAS (element-wise operations and
various multiplications), transposition, determinant evaluation and
matrix inverse. There are operations on a single row/col/diagonal of a
matrix. Distinct features of the package are Matrix views,
Matrix streams, and LazyMatrices. Lazy
construction allows us to write matrix expressions in a natural way
without introducing any hidden temporaries, deep copying, and any
LinAlg stresses Matrix streams, which provide a sequential
access to a matrix or its parts.
LABlockStreams may span over an arbitrary rectangular block of a matrix, including
the whole matrix, a single matrix element, and all other block sizes
in between. Matrix streams are
subrange-able. A stream or a substream are always created
in-line. They are safe and allocate no heap storage.
LinAlg demonstrates that many of Linear Algebra algorithms indeed require only a sequential access to a matrix or its sub-blocks. The sequential access is simpler and faster than a random one. Notable examples of such ``serial algorithms'' are matrix inverse and Householder bidiagonalizations. Every feature of the package is extensively tested by included validation code.
Ax=b, where the matrix
Adoes not have to be square, and
bdoes not have to be a vector.
Matrix A = inverse(D); C = A*B;
== 2.0 );
The README file [plain text file]
Complete source code, validation code, sample code, and test run outputs
LinAlg interface definition
LAStreams interface definition
VC 7 Project filewhich was very kindly contributed by Markus Bader
Functional Style in C++: Closures, Late Binding, and
borrows many code snippets from LinAlg
(lazy) SVD promising a matrix in extended LinAlg.shar [plain text file]
An old but still relevant article posted on comp.c++.moderated, comp.sys.mac.programmer.tools, comp.sys.mac.scitech, sci.math.num-analysis newsgroups on Tue Feb 27 11:31:08 CST 1996
LinAlg is also available from netlib
FreeBSD port and a package
This C++ class library performs radix-2 Discerete Fourier
transform of a real or complex sequence, and
cos, and complex Fourier integrals of an evenly tabulated
The input can be either
complex with/without zero padding; you can request either the full complex
transform, or only
abs part of it.
Commented C++ source code, verification code and test run outputs. The verification code demonstrates many possible usage scenarious
LinAlg: Basic Linear Algebra and Optimization classlib is required as a dependency
[The Abstract of the paper]
Recent work showed that staging and abstract interpretation can be used to derive correct families of combinatorial circuits, and illustrated this technique with an in-depth analysis of the Fast Fourier Transform (FFT) for sizes 2^n. While the quality of the generated code was promising, it used more floating-point operations than the well-known FFTW codelets and split-radix algorithm.
This paper shows that staging and abstract interpretation can in fact be used to produce circuits with the same number of floating-point operations as each of split-radix and FFTW. In addition, choosing between two standard implementations of complex multiplication produces results that match each of the two algorithms. Thus, we provide a constructive method for deriving the two distinct algorithms.
Generating radix-2 FFT kernels with the operation count
exactly matching that of FFTW is much harder than one may think.
It is emphatically not enough to specialize the textbook FFT code
to a particular sample size N, and apply a partial evaluator
or a smart compiler to eliminate trivial multiplications (by one and zero)
and trivial additions (to zero), and to lift common factors.
First of all, the specialized FFT code will contain factors
-2.44921270764475452e-16 (that is the computed
IEEE double value of
sin((pi/. 2.0) *. 4.0)). That factor
is not zero and the corresponding multiplication is not trivial,
although it should be.
A frequently occurring expression
(sin(pi/4)*x + cos(pi/4)*y)
presents a subtler problem: although the ideal
cos(pi/4) are equal, their computed values are not.
Therefore, the partial evaluator or a compiler, which see
the specialized code, hence, only the numeric values of the
FFT factors, are not able
to convert the expression to
sin(pi/4)*(x + y), saving one multiplication. Thus the necessary condition for
generating the optimal FFT kernels is the use of the exact
trigonometry. The roots of unity
exp(-I j*2pi/N) must be
manipulated symbolically rather than numerically. The condition is not
sufficient: an additional insight -- the use of symmetry -- was
required to match the operation count of FFTW (see the
add_sub_aa in the generator code below).
Relating FFTW and Split-Radix. Joint work with Walid Taha.
Proc. of ICESS'04, the First International Conference on Embedded Software and System, December 9-10 2004, Hangzhou (Zhejiang), China.
The self-contained commented MetaOCaml code
The code derives radix-2 FFT kernels with the number of additions and multiplications exactly matching those in FFTW kernels. A slight change in the generation algorithm produces mixed-radix FFT kernels.
FFTW: Fastest Fourier Transform in the West
Generating optimal code with confidence
The explanation of the general framework of staging and abstract interpretation
This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML