# Numerical Math and Scientific Computing

## `LinAlg`: Basic Linear Algebra and Optimization classlib

This C++ class library introduces `Matrix`, `Vector`, `subMatrices`, and `LAStream`s over the real domain. The library contains efficient and fool-proof implementations of level 1 and 2 BLAS (element-wise operations and various multiplications), transposition, determinant evaluation and matrix inverse. There are operations on a single row/col/diagonal of a matrix. Distinct features of the package are Matrix views, Matrix streams, and LazyMatrices. Lazy construction allows us to write matrix expressions in a natural way without introducing any hidden temporaries, deep copying, and any reference counting.

LinAlg stresses Matrix streams, which provide a sequential access to a matrix or its parts. `LABlockStream`s may span over an arbitrary rectangular block of a matrix, including the whole matrix, a single matrix element, and all other block sizes in between. Matrix streams are `seek`-able and subrange-able. A stream or a substream are always created in-line. They are safe and allocate no heap storage.

LinAlg demonstrates that many of Linear Algebra algorithms indeed require only a sequential access to a matrix or its sub-blocks. The sequential access is simpler and faster than a random one. Notable examples of such ``serial algorithms'' are matrix inverse and Householder bidiagonalizations. Every feature of the package is extensively tested by included validation code.

Other features:

• Hooke-Jeeves multidimensional optimization (with the target function being specified as closure)
• Brent's univariate minimization and root finding
• Aitken-Lagrange interpolation
• Singular Value Decomposition and its application to solving `Ax=b`, where the matrix `A` does not have to be square, and `b` does not have to be a vector.
• Matrix and Vector promises: the code
`Matrix A = inverse(D); C = A*B;`
does what it says, as efficiently as possible
• A great variety of matrix streams, e.g.:
`assert(`
`of_every(ConstMatrixRow(m,i)).max_abs(of_every(vect))`
`== 2.0 );`
The README file tells far more about the features and how to use them. The file contains many commented code snippets.
Platforms
FreeBSD 4.6.2 (a port and a package) with gcc 3.2 and gcc 2.95.x, gcc 2.96 Linux 2.4.9 i686; gcc 2.8.1 (optimization off) on HP-PA, Solaris/Sparc; SGI's native compiler (optimization off); Visual C++ 5.0 WinNT 4.0; Metrowerk's CodeWarrior C++, BeOS Release 4.
Public Domain
Version
The current version is 4.4, November 29, 2002.
References

The README file [plain text file]

LinAlg.tar.gz [115K]
Complete source code, validation code, sample code, and test run outputs

LinAlg.h [58K]
LinAlg interface definition

LAStreams.h [28K]
LAStreams interface definition

linalg-4.4_vc7.zip [377K]
VC 7 Project filewhich was very kindly contributed by Markus Bader

Functional Style in C++: Closures, Late Binding, and Lambda Abstractions
borrows many code snippets from LinAlg

(lazy) SVD promising a matrix in extended LinAlg.shar [plain text file]
An old but still relevant article posted on comp.c++.moderated, comp.sys.mac.programmer.tools, comp.sys.mac.scitech, sci.math.num-analysis newsgroups on Tue Feb 27 11:31:08 CST 1996

LinAlg is also available from netlib
<http://netlib.bell-labs.com/netlib/c++/linalg.tgz>
FreeBSD port and a package

## Fourier integrals and Discrete/Fast Fourier transform

This C++ class library performs radix-2 Discerete Fourier transform of a real or complex sequence, and `sin`, `cos`, and complex Fourier integrals of an evenly tabulated function.

The input can be either `real` or `complex` with/without zero padding; you can request either the full complex transform, or only `real`, `im`, or `abs` part of it.

Public Domain
Version
The current version is 1.2, December 25, 1998.
References

fft.tar.gz [36K]
Commented C++ source code, verification code and test run outputs. The verification code demonstrates many possible usage scenarious

`LinAlg`: Basic Linear Algebra and Optimization classlib is required as a dependency

## Generating optimal FFT code and relating FFTW and Split-Radix

[The Abstract of the paper]
Recent work showed that staging and abstract interpretation can be used to derive correct families of combinatorial circuits, and illustrated this technique with an in-depth analysis of the Fast Fourier Transform (FFT) for sizes 2^n. While the quality of the generated code was promising, it used more floating-point operations than the well-known FFTW codelets and split-radix algorithm.

This paper shows that staging and abstract interpretation can in fact be used to produce circuits with the same number of floating-point operations as each of split-radix and FFTW. In addition, choosing between two standard implementations of complex multiplication produces results that match each of the two algorithms. Thus, we provide a constructive method for deriving the two distinct algorithms.

Generating radix-2 FFT kernels with the operation count exactly matching that of FFTW is much harder than one may think. It is emphatically not enough to specialize the textbook FFT code to a particular sample size N, and apply a partial evaluator or a smart compiler to eliminate trivial multiplications (by one and zero) and trivial additions (to zero), and to lift common factors. First of all, the specialized FFT code will contain factors like`-2.44921270764475452e-16` (that is the computed IEEE double value of `sin((pi/. 2.0) *. 4.0)`). That factor is not zero and the corresponding multiplication is not trivial, although it should be. A frequently occurring expression `(sin(pi/4)*x + cos(pi/4)*y)` presents a subtler problem: although the ideal `sin(pi/4)` and `cos(pi/4)` are equal, their computed values are not. Therefore, the partial evaluator or a compiler, which see the specialized code, hence, only the numeric values of the FFT factors, are not able to convert the expression to `sin(pi/4)*(x + y)`, saving one multiplication. Thus the necessary condition for generating the optimal FFT kernels is the use of the exact trigonometry. The roots of unity `exp(-I j*2pi/N)` must be manipulated symbolically rather than numerically. The condition is not sufficient: an additional insight -- the use of symmetry -- was required to match the operation count of FFTW (see the function `add_sub_aa` in the generator code below).

Version
The current version is December 2004.
References

fft.pdf [69K]
Relating FFTW and Split-Radix. Joint work with Walid Taha.
Proc. of ICESS'04, the First International Conference on Embedded Software and System, December 9-10 2004, Hangzhou (Zhejiang), China.

fft-neo.ml [58K]
The self-contained commented MetaOCaml code
The code derives radix-2 FFT kernels with the number of additions and multiplications exactly matching those in FFTW kernels. A slight change in the generation algorithm produces mixed-radix FFT kernels.

FFTW: Fastest Fourier Transform in the West
<http://www.fftw.org>

Generating optimal code with confidence
The explanation of the general framework of staging and abstract interpretation

### Last updated January 1, 2019

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org