Skip to content

Pull requests: datajuicer/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

docs: add operator documentation
#930 opened Mar 5, 2026 by cmgzn Draft
perf: optimize TokenNumFilter with batch tokenization
#929 opened Mar 3, 2026 by JohnGiorgi Loading…
3 tasks done
perf: cache redundant sum() calls in repetition filters
#924 opened Feb 26, 2026 by dubin555 Loading…
2 tasks done
[WIP] arXiv/PDF to Markdown mappers + dj-op one-shot runner dj:op issues/PRs about some specific OPs
#917 opened Feb 14, 2026 by yxdyc Loading…
[WIP] Multi-branch executor dj:core issues/PRs about the core functions of Data-Juicer enhancement New feature or request
#916 opened Feb 13, 2026 by yxdyc Loading…
[WIP] feat: Add combined_logical_filter operator with AND/OR support dj:op issues/PRs about some specific OPs
#914 opened Feb 13, 2026 by yxdyc Loading…
[WIP] Feat: Add RayImageBTSMinhashDeduplicator
#897 opened Jan 29, 2026 by Dludora Loading…
Depth seg new op dj:op issues/PRs about some specific OPs
#862 opened Dec 22, 2025 by archernsy Loading…
Add Operator-Level Parallel Data Processing with Ray Actors dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements enhancement New feature or request
#761 opened Aug 19, 2025 by Cccccc0630 Loading…
[NewOp] Add group_diversity_filter op
#745 opened Jul 22, 2025 by lingzhq Loading…
Add lidar object segmentation op
#736 opened Jul 14, 2025 by Qirui-jiao Loading…
[WIP] add lidar object detection op
#721 opened Jun 26, 2025 by Cathy0908 Loading…
Optimization framework dj:core issues/PRs about the core functions of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements
#702 opened Jun 13, 2025 by cyruszhang Draft
[WIP] deduping benchmark suite
#607 opened Mar 4, 2025 by cyruszhang Loading…
ProTip! Exclude everything labeled bug with -label:bug.