-
Notifications
You must be signed in to change notification settings - Fork 216
feat(linalg): add generalized least squares solver #1105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(linalg): add generalized least squares solver #1105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a generalized least-squares (GLS) solver for the stdlib_linalg module, addressing issue #1047 (2 of 2). The GLS solver handles least-squares problems with correlated errors described by a covariance matrix, using LAPACK's GGGLM routine to minimize the Mahalanobis distance (Ax-b)^T W^{-1} (Ax-b).
Changes:
- Adds
generalized_lstsqfunction interface supporting real and complex types with correlated error handling via SPD/Hermitian covariance matrices - Implements memory-safe design that always copies the covariance matrix W internally and follows the
overwrite_apattern from existing solvers - Provides comprehensive error handling via
handle_ggglm_infoconsistent with existing LAPACK wrapper patterns
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/linalg/stdlib_linalg_least_squares.fypp | Core implementation of generalized_lstsq with Cholesky factorization and GGGLM solver integration |
| src/linalg/stdlib_linalg.fypp | Public interface declaration with documentation for the new generalized_lstsq function |
| src/lapack/stdlib_linalg_lapack_aux.fypp | New handle_ggglm_info error handler following established patterns for LAPACK error processing |
| test/linalg/test_linalg_lstsq.fypp | Test suite with basic GLS solver tests and identity covariance validation against OLS |
| example/linalg/example_generalized_lstsq.f90 | Example program demonstrating GLS usage with correlated errors |
| example/linalg/CMakeLists.txt | Build configuration update to include the new example |
| doc/specs/stdlib_linalg.md | Comprehensive API documentation for generalized_lstsq with syntax, arguments, and usage example |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1105 +/- ##
==========================================
- Coverage 68.55% 68.48% -0.07%
==========================================
Files 396 397 +1
Lines 12746 12757 +11
Branches 1376 1376
==========================================
- Hits 8738 8737 -1
- Misses 4008 4020 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@jvdp1 Addressed all Copilot suggestions |
loiseaujc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I took a quick glance at your code and will try to go deeper into it by the end of the week. Looks pretty good so far.
|
@loiseaujc Addressed all comments |
| ! User provided pre-factored L: zero out upper triangle for GGGLM | ||
| do concurrent(i=1:m, j=1:m, i < j) | ||
| lmat(i, j) = zero | ||
| end do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you use cholesky(lmat, lower=.true., other_zeroed=.true.), you do not need to zero-out lmat again. cholesky does it for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loiseaujc
the do concurrent zeroing is in the else branch and it only runs when prefactored_w=.true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went Through other stdlib functions , Should I remove the else branch and trust user provides proper lower triangular L?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two possibilities there:
- Dumb-proof the code and effectively lower the upper triangular part of the user provided matrix just to make sure.
or
- Specify clearly in the specs that, if already prefactored, the upper triangular part of
$W$ needs to be zero.
@perazz @jalvesz @jvdp1 : What would be your take on this and user-friendliness in general for such issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since lmat is an internal allocatable variable, the simplest solution would be:
Allocate lmat accordingly with source=zero, assign the lower symmetric values from w systematically. Just check if (.not. is_prefactored) to perform the cholesky factorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loiseaujc I agree: when w is a user-provided prefactored matrix, we should not modify that. It is the user's responsibility to provide a correct input matrix (perhaps we could state that in the documentation). It would be also worthwhile to check if DGGGLM actually does access the upper triangular part or not - perhaps this zeroing is not even necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xGGGLM solves the following constrained quadratic program
!> minimize || y ||_2 subject to d = A*x + B*y
!> x
where A and B are generic matrices. The generalized least-squares problem can be recast in this form where B is a square root of the symmetric positive definite weight matrix W. Since xGGGLM works for arbitrary matrices B, we indeed do need to zero-out the upper triangular part if B is a Cholesky factor.
But I actually agree that we should not touch the user-provided matrix. One simple reason is that the Cholesky factor is not the only way to express the matrix square-root ! Maybe for some reason, user has computed it with svd. It would be a valid input and the routine should still work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aamrindersingh : Following with previous comment, could you add a test where the matrix square root of W is computed using its svd followed by taking the non-negative square-root of the singular values ? If I'm correct, it should return the same solution as if the Cholesky factor of W was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,
Removed the upper triangle zeroing and added an eigendecomposition based sqrt test. Confirms both give identical solutions as expected. Used eigh since it is equivalent to SVD for SPD matrices.
Co-authored-by: Copilot <[email protected]>
a73a5df to
296727a
Compare
|
@loiseaujc Rebased onto master, CI should pass now. |
| case (-3) | ||
| err = linalg_state_type(this, LINALG_VALUE_ERROR, 'Invalid number of columns for B, p=', p) | ||
| case (-5) | ||
| err = linalg_state_type(this, LINALG_VALUE_ERROR, 'Invalid leading dimension for A, lda < m=', m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pass lda, ldb to this routine so the
| err = linalg_state_type(this, LINALG_VALUE_ERROR, 'Invalid leading dimension for A, lda < m=', m) | |
| err = linalg_state_type(this, LINALG_VALUE_ERROR, 'Invalid leading dimension for A, lda=',lda,' is < m=', m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ! Validate sizes | ||
| if (size(w, 1, kind=ilp) /= m .or. size(w, 2, kind=ilp) /= m) then | ||
| err0 = linalg_state_type(this, LINALG_VALUE_ERROR, & | ||
| 'Covariance matrix must be square m×m:', [size(w, 1, kind=ilp), size(w, 2, kind=ilp)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 'Covariance matrix must be square m×m:', [size(w, 1, kind=ilp), size(w, 2, kind=ilp)]) | |
| 'Covariance matrix must be square m×m:', shape(w, kind=ilp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| end if | ||
|
|
||
| ! Handle covariance/Cholesky factor | ||
| ! ALWAYS copy W because GGGLM modifies it (protects user's data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we provide an overwrite_w logical argument, the same way we do for overwrite_a? @loiseaujc that would allow to avoid internal allocation if the user can afford that.
We have strived to allow users to pass all necessary variables as optional arguments and avoid internal allocations wherever possible, here we have at least 4 separate allocations at every call. Perhaps this could be avoided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would seem consistent with the rest of the stdlib_linalg stuff. As usual, overwrite_w would default to .false. just to make sure only users who know what they're doing are actually overwriting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perazz @loiseaujc
Added overwrite_w (default .false.), matching overwrite_a pattern. Both main matrix allocations are now user controllable.
Are any other changes required?
perazz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @aamrindersingh, I have added some comments.
Resolves #1047 (2 of 2)
This PR adds the
generalized_lstsqinterface tostdlib_linalgfor least-squares problems with correlated errors described by a symmetric positive definite covariance matrix. Usage:x = generalized_lstsq(W, A, b). This minimizes the Mahalanobis distance(Ax-b)^T W^{-1} (Ax-b)using LAPACK'sGGGLM.The key design decisions are:
generalized_lstsqalways copiesWinternally to protect user data fromGGGLM's destructive operations. This applies regardless of theprefactored_wflag, the inputWmatrix is never modified.overwrite_apattern fromsolve_luwherecopy_adefaults to.true.to preserveAunless the user explicitly opts into destruction for performance.handle_ggglm_infoerror handler was added to parse LAPACK return codes, consistent with existinghandle_gelsd_infoandhandle_gglse_infopatterns.Testing includes:
lstsqpatterns