Augmenty is an augmentation library based on spaCy for augmenting texts.
-
Updated
May 24, 2024 - Python
Augmenty is an augmentation library based on spaCy for augmenting texts.
MLOne Powered by AIEdX. Machine Learning Course for Everyone. Tier1 Basic
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
A tool for fixing a BibTeX reference list using DBLP API
Grapheme-to-phoneme rule-based converter for Polish in Go.
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
An always-a-work-in-progress combination of documentation and demo notebooks for working with the LatinCy models
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)
DSFSI South African Terminlogy Lists and Lexicon Project
Embedding Evaluation Data for South African Languages
This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of …
Easier Automatic Sentence Simplification Evaluation
🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code
Add a description, image, and links to the nlproc topic page so that developers can more easily learn about it.
To associate your repository with the nlproc topic, visit your repo's landing page and select "manage topics."