Skip to content

Conversation

@janniklinde
Copy link
Contributor

This patch adds an out-of-core CSV reblock instruction. It supports reading single or multiple row partitioned CSV files into dense matrix blocks. Reads are currently performed by a single thread and thus performance is comparable to (slightly slower than) non-parallel dense CSV reads if all blocks per row can be held in cache. The number of maximum blen x blen matrix blocks that are constructed in memory simultaneously can be specified by MAX_BLOCKS_IN_CACHE.

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 80.06135% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.32%. Comparing base (c7300f3) to head (c0e87a4).

Files with missing lines Patch % Lines
...ime/instructions/ooc/CSVReblockOOCInstruction.java 80.18% 39 Missing and 25 partials ⚠️
...rc/main/java/org/apache/sysds/lops/CSVReBlock.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2352      +/-   ##
============================================
+ Coverage     72.29%   72.32%   +0.02%     
- Complexity    46829    46880      +51     
============================================
  Files          1508     1509       +1     
  Lines        177638   177962     +324     
  Branches      34880    34938      +58     
============================================
+ Hits         128430   128708     +278     
- Misses        39511    39535      +24     
- Partials       9697     9719      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mboehm7
Copy link
Contributor

mboehm7 commented Nov 16, 2025

LGTM - thanks for the patch and the additional simplification @janniklinde. For now we focus on lean code and optimizing the common case in terms of data characteristics.

@mboehm7 mboehm7 closed this in 6f3cdb3 Nov 16, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Nov 16, 2025
aperov9 pushed a commit to aperov9/systemds that referenced this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants