90 bpm songs
The Intel® MKL or BLIS* framework version of the GEMM kernel. Single-precision or double-precision GEMM (SGEMM/DGEMM). Here is a high-level overview of what the benchmark code does: Takes as its only parameter the problem size N. Allocates matrices A, B, and C of size N x N, and initializes them with random data.
Intel Xeon (2.8 GHz): 20s. Intel Core 2 Q9400 (2.6 GHz): 48s. AMD Opteron 6276 (2.3 GHz): 76s. AMD Opteron 6378 (2.4 GHz): 100s. There are many variables here. All these are running with 4 cores. However, the AMDs are launched on an HPC with access to more cores. The Intels are much faster, but I was very surprised when the faster AMD did. When a user installs NumPy from conda-forge, that BLAS package then gets installed together with the actual library - this defaults to OpenBLAS, but it can also be MKL (from the defaults channel), or even BLIS or reference BLAS. The MKL package is a lot larger than OpenBLAS, it's about 700 MB on disk while OpenBLAS is about 30 MB.