-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory footprint for large datasets #90
Comments
@willwerscheid If the data matrix is sparse, but you also have other matrices of the same dimension that are dense, there really isn't much benefit to allowing the data matrix to be sparse, and may complicate your code (e.g., you will have to convert matrix-vector products to be dense). The key is to reduce the number of times you are modifying large matrices within nested functions. e.g., if I am doing
and both |
@pcarbo Exactly. I think that the main advantage to point 3 is actually that datasets are often downloadable as |
@willwerscheid Up to you. An alternative is to only accept matrices of class |
After looking more closely I don't think any of these are worth the doing, at least for now. 1. This would indeed save memory equal to the size of |
Some ideas:
Y
andYorig
in the flash data object.tau
as a vector whenvar_type = by_column
orby_row
. This could be tricky, but it's probably worth it since flash fit objects are frequently copied.Y
to be adgCMatrix
, and likewise forS
.If we do the above, then the only large dense matrices will be the matrices of residuals and squared residuals. (Or rather,
R2
,Rk
, andR2k
for the greedy step.) So, optimistically, we might be able to shoot for a memory requirement of 5x the size of the original data (measured as a dense matrix) whenY
is sparse and 6-8x otherwise.The text was updated successfully, but these errors were encountered: