data-cleaning

Here are 2,824 public repositories matching this topic...

aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

machine-learning game-theory data-cleaning data-quality banzhaf-index influence-functions robust-machine-learning shapley-value data-valuation data-centric-ai transferlab least-core data-pruning

Updated Jun 6, 2024
Python

catalyst / moodle-local_datacleaner

Star

Reduce, filter, and anonymize moodle data for non-prod environments

plugin php moodle data-cleaning anonymize datacleaner

Updated Jun 6, 2024
PHP

data-catering / data-caterer

Star

Test data management tool for any data source, batch or real-time

data-validation data-generation data-cleaning data-quality test-data-generator test-environment automated-test-generation data-testing testing-automation data-contracts data-test test-data-management

Updated Jun 6, 2024
Scala

johnkerl / miller

Star

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated Jun 6, 2024
Go

scribe-org / Scribe-Data

Star

Wikidata and Wikipedia language data extraction

Updated Jun 6, 2024
Python

Gokulakkrizhna / industrial-copper-modelling

Star

We leverage machine learning and data analysis to address real-world challenges in the copper industry. Our documentation encompasses data preprocessing, feature engineering, classification, regression, and model selection. Explore how we've enhanced predictive capabilities to optimize manufacturing solutions.

python machine-learning numpy plotly cross-validation pandas statistical-analysis outlier-detection data-preprocessing dataframe data-cleaning feature-scaling smote classification-algorithm regression-algorithms category-encoder eda-analysis

Updated Jun 5, 2024
Python

Stevmorris93 / SQL-Projects

Star

MySQL projects utilizing basic to intermediate SQL concepts

mysql events automation queries sql joins indexes data-cleaning stored-procedures triggers

Updated Jun 5, 2024

cleanlab / cleanlab-studio

Star

Client interface for all things Cleanlab Studio

Updated Jun 6, 2024
Python

bhammy27 / Fantasy_Football_database_SQL

Star

A desire to win my Fantasy Football leagues led to a realization that I have a passion for Data Analytics. I will create my own database using postgreSQL and pgAdmin.

sql fantasy-football postgresql-database pgadmin data-cleaning ddl-scripts etl-process dml-commands

Updated Jun 5, 2024

ccb-hms / ontology-mapper

Star

a tool for mapping free-text descriptions of entities to ontology terms

owl ontology data-cleaning metadata-cleaning ontology-mapping

Updated Jun 5, 2024
Python

voxel51 / fiftyone

Star

The open-source tool for building high-quality datasets and computer vision models

visualization python data-science machine-learning computer-vision deep-learning artificial-intelligence developer-tools image-classification object-detection data-cleaning active-learning data-quality data-curation unstructured-data vector-search data-centric-ai

Updated Jun 6, 2024
Python

Desbordante / desbordante-core

Star

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated Jun 5, 2024
C++

skrub-data / skrub

Star

Prepping tables for machine learning

data-science data machine-learning data-analysis data-wrangling data-preprocessing data-preparation data-cleaning dirty-data

Updated Jun 5, 2024
Python

jkminder / data2neo

Star

Data2Neo is a library that simplifies the convertion of data in relational format to a graph knowledge database.

neo4j graphs data-engineering database-migrations relational-databases data-cleaning data-conversion remodeling data2neo

Updated Jun 5, 2024
Python

GuilleDiaz7 / Automatic-Web-Scraping-of-Spanish-TDT-Films

Star

This is a repository to automate the scraping of every film shown in the Spanish public TV, using rvest and GitHub actions.

r web-scraper cinema rmarkdown web-scraping data-collection data-cleaning rmarkdown-document github-actions parametrized-report

Updated Jun 5, 2024
R

DanielOladipupo / Titanic-Analytics-Project

Star

This project predicts whether a passenger on the Titanic survived or not. The dataset typically used for this project contains information about individual passengers, such as their age, gender, ticket class, fare, cabin, and whether or not they survived.

data-visualization data-analysis data-cleaning data-interpretation

Updated Jun 5, 2024

Manu-Abuya / Food-Delivery-Cost-and-Profitability-Analysis

Star

Analyze and optimize the cost and profitability of a food delivery service using Python. This project includes data cleaning, cost breakdown, revenue calculation, and strategic recommendations to enhance profitability.

python data-visualization data-analysis financial-analysis data-cleaning cost-optimization food-delivery profitability-analysis revenue-calculation