Skip to content

Pull requests: allenai/dolma

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Tokenization sanitizer script
#242 opened Feb 19, 2025 by soldni Loading…
first
#240 opened Feb 14, 2025 by Whattabatt Draft
Code-prose-composition tagger
#234 opened Feb 13, 2025 by no0p Loading…
[WIP DO NOT MERGE] Learn2Code Feature Branch
#233 opened Feb 13, 2025 by cmwilhelm Loading…
simpler logic for calculating code taggers
#229 opened Feb 12, 2025 by kyleclo Loading…
Bump openssl from 0.10.66 to 0.10.70 in the cargo group dependencies Pull requests that update a dependency file rust Pull requests that update Rust code
#228 opened Feb 3, 2025 by dependabot bot Loading…
New language ID
#223 opened Dec 30, 2024 by soldni Loading…
Update reference to Phishing.Database.
#222 opened Dec 15, 2024 by phishing-database-bot Loading…
DCLM Style Deduplications
#214 opened Sep 30, 2024 by revbucket Loading…
Mattj/requirements
#212 opened Sep 26, 2024 by revbucket Loading…
DNM: Patch FT Tagger
#210 opened Sep 25, 2024 by undfined Draft
New Progress Bar, Backoff, Batching
#165 opened May 23, 2024 by soldni Loading…
Warc Backoff
#160 opened May 10, 2024 by soldni Loading…
Baseline data
#61 opened Oct 20, 2023 by IanMagnusson Draft
Text modification config
#60 opened Oct 19, 2023 by rodneykinney Loading…
ProTip! Follow long discussions with comments:>50.