-
Notifications
You must be signed in to change notification settings - Fork 334
Pull requests: datajuicer/data-juicer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
perf: optimize TokenNumFilter with batch tokenization
#929
opened Mar 3, 2026 by
JohnGiorgi
Loading…
3 tasks done
fix(ops): fix GeneralFusedOP discarding Mapper results in fused pipeline
#928
opened Mar 3, 2026 by
dubin555
Loading…
fix(ops): fix NlpaugEnMapper only augmenting first sample in batch
#927
opened Mar 3, 2026 by
dubin555
Loading…
fix(ops): prevent shared mutable _default_kwargs pollution across operator instances
#926
opened Mar 3, 2026 by
dubin555
Loading…
6 tasks done
feat(mapper): add custom tokenizer support to RemoveRepeatSentencesMapper
#925
opened Feb 27, 2026 by
JohnGiorgi
Loading…
perf: cache redundant sum() calls in repetition filters
#924
opened Feb 26, 2026 by
dubin555
Loading…
2 tasks done
feat: add latex_figure_context_extractor_mapper operator
#923
opened Feb 26, 2026 by
liyuyi-2001
Loading…
Add support for json[l].gz, and make ray dataset support reading json…
#919
opened Feb 25, 2026 by
HunterLine
Loading…
[WIP] arXiv/PDF to Markdown mappers + dj-op one-shot runner
dj:op
issues/PRs about some specific OPs
#917
opened Feb 14, 2026 by
yxdyc
Loading…
[WIP] Multi-branch executor
dj:core
issues/PRs about the core functions of Data-Juicer
enhancement
New feature or request
#916
opened Feb 13, 2026 by
yxdyc
Loading…
[WIP] feat: Add combined_logical_filter operator with AND/OR support
dj:op
issues/PRs about some specific OPs
#914
opened Feb 13, 2026 by
yxdyc
Loading…
[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper
#902
opened Feb 1, 2026 by
1van2ha0
Loading…
3 tasks
[WIP] feat: Pr 839 s3 download checkpoint resume and unittest for s3 download
#870
opened Dec 25, 2025 by
Dludora
Loading…
Depth seg new op
dj:op
issues/PRs about some specific OPs
#862
opened Dec 22, 2025 by
archernsy
Loading…
Add Operator-Level Parallel Data Processing with Ray Actors
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
enhancement
New feature or request
#761
opened Aug 19, 2025 by
Cccccc0630
Loading…
[NewOp] Add generate_challenging_qa_mapper based on MindGYM principles
#703
opened Jun 14, 2025 by
Bat-Reality
Loading…
Optimization framework
dj:core
issues/PRs about the core functions of Data-Juicer
dj:efficiency
regarding to efficiency issues and enhancements
#702
opened Jun 13, 2025 by
cyruszhang
•
Draft
[NewOp] Add domain_diversity_selector based on DaaR principles
#699
opened Jun 12, 2025 by
lingzhq
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.