OpenBLAS

OpenBLAS
Original author(s)	Kazushige Goto
Developer(s)	Zhang Xianyi, Wang Qian, Werner Saar
Initial release	22 March 2011; 13 years ago
Stable release	0.3.27 / 4 April 2024; 5 days ago
Repository	github.com/OpenMathLib/OpenBLAS ;
Written in	C, modern Fortran
Operating system	Linux; Microsoft Windows; macOS; FreeBSD;
Platform	x86, x86-64; MIPS; ARM, AArch64; POWER, PPC64; IBM Z; SPARC; RISC-V;
Type	Linear algebra library; implementation of BLAS
License	BSD License
Website	www.openblas.net

OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software and Computational Science, ISCAS.

OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge^[3] and Loongson.^[4] It claims to achieve performance comparable to the Intel MKL: this mostly holds true on the BLAS part, while the LAPACK part falls behind.^{[citation needed]} On machines that support the AVX2 instruction set, OpenBLAS can achieve similar performance to MKL, but there are currently almost no open source libraries comparable to MKL on CPUs with the AVX512 instruction set.

OpenBLAS is a fork of GotoBLAS2, which was created by Kazushige Goto at the Texas Advanced Computing Center.

History and present edit

OpenBLAS was developed by the parallel software group led by Professor Yunquan Zhang from the Chinese Academy of Sciences.

OpenBLAS was initially only for the Loongson CPU platform. Dr. Xianyi Zhang contributed a lot of work. Since GotoBLAS was abandoned, the successor OpenBLAS is now developed as an open source BLAS library for multiple platforms, including x86, ARMv8, MIPS, and RISC-V platforms, and is respected for its excellent portability.

The parallel software group is modernizing OpenBLAS to meet current computing needs. For example, OpenBLAS's level-3 computations were primarily optimized for large and square matrices (often considered as regular-shaped matrices). And now irregular-shaped matrix multiplication are also supported, such as tall and skinny matrix multiplication (TSMM),^[5] which supports faster deep learning calculations on the CPU. TSMM is one of the core calculations in deep learning operations. Besides this, the compact function and small GEMM will also be supported by OpenBLAS.

References edit

^ "OpenBLAS 0.3.27 version". 4 April 2024. Retrieved 4 April 2024.
^ "OpenBLAS". 25 October 2021.
^ Wang Qian; Zhang Xianyi; Zhang Yunquan; Qing Yi (2013). AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs (PDF). Int'l Conf. on High Performance Computing, Networking, Storage and Analysis.
^ Zhang Xianyi; Wang Qian; Zhang Yunquan (2012). Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor. IEEE 18th Int'l Conf. on Parallel and Distributed Systems (ICPADS).
^ Chendi Li; Haipeng Jia; Hang Cao; Jianyu Yao; Boqian Shi; Chunyang Xiang; Jinbo Sun; Pengqi Lu; Yunquan Zhang (2021). AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs (PDF). IEEE International Symposium on Parallel and Distributed Processing with Applications.

External links edit

Official website

[wikidata-2655a2b7d5a1422a9808d510344e86fb30170267-v11-1] "OpenBLAS 0.3.27 version". 4 April 2024. Retrieved 4 April 2024.

[2] "OpenBLAS". 25 October 2021.

[3] Wang Qian; Zhang Xianyi; Zhang Yunquan; Qing Yi (2013). AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs (PDF). Int'l Conf. on High Performance Computing, Networking, Storage and Analysis.

[4] Zhang Xianyi; Wang Qian; Zhang Yunquan (2012). Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor. IEEE 18th Int'l Conf. on Parallel and Distributed Systems (ICPADS).

[5] Chendi Li; Haipeng Jia; Hang Cao; Jianyu Yao; Boqian Shi; Chunyang Xiang; Jinbo Sun; Pengqi Lu; Yunquan Zhang (2021). AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs (PDF). IEEE International Symposium on Parallel and Distributed Processing with Applications.

[1]

[2]

[3]

[4]

[5]

OpenBLAS

Contents

History and present edit

See also edit

References edit

External links edit