https://github.com/linbox-team/fflas-ffpack
Tip revision: cf5d31eef5b19c781252fd674ccba2d9a2865f79 authored by Cyril Bouvier on 21 October 2020, 15:10:31 UTC
Add more precise type_string to SIMD structs
Add more precise type_string to SIMD structs
Tip revision: cf5d31e
ChangeLog
2019-06-07 v2.4.1
Minor updates:
* fix setnumthreads in pDet
* add README.md in distribution
* support for Hygon Dhyana
2019-05-10 v2.4.0
New features:
* fsytrf: a symmetric triangular factorization, revealign the RPM
* fsyrk, fsyr2k, ftrssyr2k, ftrstr: subroutines for symmetric operations
* support for AVX512 vectorization
* parallelization of fgemm-rns, fsytrf, echelon forms, det, rank, etc
* API for parallel routines outside of par-block (for e.g. SageMath)
Improvements:
* more examples
* more benchmarks: fgesv
* many bug fixes
* improved testsuite
* update to the Givaro's revamped modular fields
* improved igemm
* improved test coverage for SIMD
* improved charpoly
* improved freduce and consequently speed up most routines
2017-12-21 v2.3.2
Improvements:
* minor bug fixes in the build system and with GF2
* new specialization for fgemv over recint
2017-11-22 v2.3.1
Improvements:
* minor bug fixes in the build system
* improved cblas/fblas detection and use
2017-11-17 v2.3.0
Improvements:
* improved build system (instruction set detection, C++11 and clang compatibility, ...)
* improved fttrtri (triangular matrix inverse)
* increased test-suite coverage
* more autotuning
* clean-up and update all random matrix generator so they can be seeded.
* clean-up the test-suite and enable seeding parameter
* many bug fixes (and merging sage patches)
New features:
* new pfgemv routine (parallel matrix vector product)
* new fpotrf routine (Cholesky factorization) and symmetric rand generator
* new tutorials
* Gauss-Jordan inverse made to work
Changes in API
* change signature for CharPoly (now takes a polynomial domain as input)
* change the signature of ftrtrm
2016-07-30 v2.2.2
* many bug fixes ensuring a consistent support of clang, gcc-4.8 5.3 6.1
icpc on i386 x86_64, ubuntu and fedora, ppcle and osx
* new SIMD detection
* use pkgconfig
* new feature: checkers for Freivalds based verification
* improved performance of permutation application
2016-04-08 v2.2.1
* many fixes to the build system
* more consistent use of flags and dependency to precompiled code
* fixes all remaining issues for the integration in SageMath
* numerous minor fixes to the parallel code
2016-02-23 v2.2.0
* new precompiled interface
* improvements and API change for the parallel code
* new random matrix generators
* fix many bugs
2015-06-11 v2.1.0
Test suite and benchmark improvement :
* much larger coverage
* run most tests over a wide range of fields
* systematic interface and options
New features:
* parallel PLUQ
* computation of rank profiles and rank profile matrices
* echelon and reduced echelon forms form both LUdivine and PLUQ
* getters to the forms and the transformation matrices
* igemm routine for BLAS like gemm on 64bits ints
* support of Modular<int64_t> and ModularBalanced<int64_t> using igemm,
to support fields of bitsize between 25 and 31
* support of Modular<rint<K> > for Z/pZ with p of size > 32bits (based
on Givaro's RecInt multiprecision integers)
* support of RNS based gaussian elimination on multiprecision fields
* Paladin: DSL for parallel programming adressing OMP, TBB and Kaapi
Improvements:
* a lot of new sparse mat-vec product improvements
* faster parallel and sequential fgemm
* many bugs found and removed (no known bugs at release time)
* improved helper system, with mode of operations
2014-08-08 v2.0.0
code update :
* rank profile
* clean namespaces
* use field one, zero, etc
* fix clang warnings
* more blas wrappers (sger, sdot, copy, etc)
* simplification of fgemm
* simplify blas detection (+cflags)
* easier permutation handling
* improve testers
* use std::min, max
* many functions have API change to use last pointer argument for return
* some more doc
* and probably many more in 2+ years !
bugs :
* correct permutations
* fix fgemm, fgemv, ftrmm, ftrsm bugs
* mem leaks
* bugs for degenerate cases
* fix bounds
* and probably many more in 2+ years !
new features :
* new pluq 2x2 recursive alg
* leftlooking
* parallel OMP fgemm, ftrmm, ftrsm
* parallel KAAPI fgemm, ftrmm, ftrsm
* new testers for pluq, fgemm, etc
* new tester for Bini approximate formula
* fadd, fsub, finit, fscal, etc
* vectorisation using AVX(2)
* in place schedules
* new Echelon code
* helper design for fgemm, fgemv, etc
* template factorisation for modular/multiprecision fields
* helper traits
* automatic matrix field conversion (ie double -> float)
* add spmv kernels
* enable use of sparse MKL
* parallel.h, avx and simd files
* new DSL for parallelism
* RNS and multiprecision fields
* new const_cast, fflas_new etc functions
* element_ptr in fields
* use Givaro dependency (compulsory now)
* new test for regressions (with tickets)
* and probably many more in 2+ years !
2011-04-15 v1.4.0
* Convert project to autotools (à la LinBox et Givaro)
2008-06-05 v1.3.3
* fix the design of specializations to modular<double> modular<float>
* give a proper name to ModularBalanced
* fix the bugs in the bound computations (Winograd recursion over the
finite field was too deep)
* prepare the interface for integrating compressed representation for
small finite fields
2007-09-28 v1.3.2
* add routines fgetrs and fgesv (cf LAPACK), for system solving.
supports rectangular, over/underdetermined systems.
2007-08-29 v1.3.1
* add the benchmark directory, for automatic benchmarking against GOTO
and ATLAS BLAS. Adapted from Pascal Giorgi's benchmark system.
2007-08-28 v1.3.0
* new version of ftrmm ftrsm: ftrsm based on a multicascade algorithm
reducing the number of modular reductions). Automated generation of each
of the 48 specializations
* several bug fixes
* add regression tests: testeur_fgemm, testeur_lqup and testeur_ftrsm
2007-07-05 v1.2.2
* add a transposed version of the LQUP decomposition routine
LUdivine
* fix many bugs in LUdivine
* new schedules for Winograd algorithm for matrix multiplication:
2 cases depending whether beta = 0 or not, taken form [Huss
Ledermann & Al. 96]
* add rowEchelon and ReducedRowEchelon routines + associated tests
2007-06-21 v1.2.1
* add the use of float BLAS, if the field caradinality is small enough
* improve genericity: gemm can be use over any field domain (not
requiring any conversion to a integral representation)
* add a variant of Winograd's algorithm with less temporaries for
the operation C = AxB
* add ColumnEchelon and ReducedColumnEchelon routines, using an
inplace algorithm, based on the LQUP decompositon of LUdivine
* add routines ftrtri (replacing invL), ftrtrm.
* fix bunch of memory leaks in the tests (not yet finished)
2007-03-13 v1.1.2
* change the genericity system for trsm to detect Field
implementations over double (compatibility with LinBox)
2007-03-11 v1.1.1
* complete preconditioning phase for the new Charpoly algorithm
* new Charpoly algorithm renamed CharpolyArithProg
* add exception for failure of the LasVegas algrithm
* default charpoly is now: 2 attempts to CharpolyArithProg, then LUKrylov
2007-02-27 v1.1.0
* change some naming conventions in the directories
* add a LQUP routine for small dimension (LUdivine_small) and the
cascading with LUdivine
* put the bound computations in the same file
* add dense_generator.C for the generation of random dense
matrices in tests
* add the new algorithm for characteristic polynomial (temporarily
named frobenius)
2006-08-11 v1.0.1
* add the field implementation modular-positive.h, especially for
p=2
* add a the flag 'balanced' to the finite fields modular<double>,
to switch to the apropriate bound computation (fgemm and trsm)
* fix a bug in LUDivine LQUP elimination (initialisation of the
permutation P for N=1 in the terminal case)
* fix a bug in the determination of the number of recursive levels
of Winograd Algorithm.