Improved performance for small correlation matrix
This release improves performance when Threads.nthreads()
is large but the correlation matrix being calculated has few rows and columns. The code no longer allocates scratch space that will never be used.