Skip to content

Conversation

@aamrindersingh
Copy link

Resolves #1047

This PR adds two interfaces to stdlib_linalg for weighted least-squares problems. The weighted_lstsq interface handles diagonal weight matrices where each observation has a different importance, common in heteroscedastic regression and outlier downweighting. Usage: x = weighted_lstsq(w, A, b) where w is a vector of positive weights. It transforms to ordinary least-squares via row scaling and uses LAPACK's GELSD. The generalized_lstsq interface handles correlated errors described by a symmetric positive definite covariance matrix: x = generalized_lstsq(W, A, b). This minimizes the Mahalanobis distance (Ax-b)^T W^{-1} (Ax-b) using LAPACK's GGGLM.

The key design decisions are:-

(1) weighted_lstsq validates that all weights are positive since negative weights have no statistical meaning and would cause numerical issues with sqrt(w).
(2) generalized_lstsq always copies W internally to protect user data from GGGLM's destructive operations. This applies regardless of the prefactored_w flag , the input W matrix is never modified.
(3) Both interfaces follow the overwrite_a pattern from solve_lu where copy_a defaults to .true. to preserve A unless the user explicitly opts into destruction for performance.
(4) A handle_ggglm_info error handler was added to parse LAPACK return codes, consistent with existing handle_gelsd_info and handle_gglse_info patterns.

Testing includes:-

  • Basic WLS solves with uniform weights
  • Verification that non-uniform weights change solutions
  • Error handling for negative weights
  • Basic GLS solves with correlated errors
  • Verification that GLS with identity covariance equals OLS
  • All real types (sp/dp/qp/xdp) covered following existing lstsq patterns

- Add weighted_lstsq for diagonal weight matrices (row scaling)
- Add generalized_lstsq for SPD covariance matrices (GGGLM)
- Add handle_ggglm_info error handler
- Add comprehensive tests for both solvers
- Add API documentation and examples
@jalvesz
Copy link
Contributor

jalvesz commented Jan 24, 2026

Hi @aamrindersingh I just skimmed through your implementation. For the moment I won't comment on the design, just a minor very important detail:
I saw this kind of loops a few times in your code

do i = 1, m
      a_scaled(i, :) = sqrt_w(i) * a(i, :)
end do

In Fortran, indexes go from left to right in terms of memory access (as opposed to C based languages, where indexes go from right to left). This implies that this loop is forcing cache misses at every j for all i rows. Please rewrite your loops either using do concurrent with a j column indexe in front or a plain nested do loop where j is the outmost level.

@aamrindersingh
Copy link
Author

Thanks for the review @jalvesz!

Fixed all row-wise loops to use column-major access . It was a good learning moment on how Fortran's left to right indexing means do j=1,n; a(:,j)=...; end do stays contiguous while row-wise forces cache misses.

Did a full dry run after fixing using critical edge cases, found a few more things:- workspace using int() instead of ceiling(), missing cleanup on error paths, and GGGLM needing upper triangle zeroed since potrf only sets the lower part.

Eager to discuss design further

`c`: Shall be a rank-2 `real` or `complex` array of the same kind as `a` defining the linear equality constraints. It is an `intent(in)` argument.
`lwork`: Shall be an `integer` scalar returning the optimal size required for the workspace array to solve the constrained least-squares problem.

## `weighted_lstsq` - Computes the weighted least squares solution to a linear matrix equation {#weighted-lstsq}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aamrindersingh . Could you split this PR in two smaller one (weighted_lstsq and generalized_lstsq), please? Or are these two implementations dependent of each other?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not dependent on each other , Will start on splitting them into two PRs.
Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.44%. Comparing base (b32b42a) to head (42b28a4).
⚠️ Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
example/linalg/example_generalized_lstsq.f90 0.00% 11 Missing ⚠️
example/linalg/example_weighted_lstsq.f90 0.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
- Coverage   68.55%   68.44%   -0.11%     
==========================================
  Files         396      398       +2     
  Lines       12746    12766      +20     
  Branches     1376     1376              
==========================================
  Hits         8738     8738              
- Misses       4008     4028      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aamrindersingh aamrindersingh requested a review from jvdp1 January 25, 2026 16:11
@loiseaujc
Copy link
Contributor

Closing this PR as it has been split between #1104 and #1105.

@loiseaujc loiseaujc closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More least-squares variants

4 participants