pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
-
Updated
Jun 6, 2024 - Python
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Reduce, filter, and anonymize moodle data for non-prod environments
Test data management tool for any data source, batch or real-time
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Wikidata and Wikipedia language data extraction
We leverage machine learning and data analysis to address real-world challenges in the copper industry. Our documentation encompasses data preprocessing, feature engineering, classification, regression, and model selection. Explore how we've enhanced predictive capabilities to optimize manufacturing solutions.
MySQL projects utilizing basic to intermediate SQL concepts
Client interface for all things Cleanlab Studio
A desire to win my Fantasy Football leagues led to a realization that I have a passion for Data Analytics. I will create my own database using postgreSQL and pgAdmin.
a tool for mapping free-text descriptions of entities to ontology terms
The open-source tool for building high-quality datasets and computer vision models
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Prepping tables for machine learning
Data2Neo is a library that simplifies the convertion of data in relational format to a graph knowledge database.
This is a repository to automate the scraping of every film shown in the Spanish public TV, using rvest and GitHub actions.
This project predicts whether a passenger on the Titanic survived or not. The dataset typically used for this project contains information about individual passengers, such as their age, gender, ticket class, fare, cabin, and whether or not they survived.
Analyze and optimize the cost and profitability of a food delivery service using Python. This project includes data cleaning, cost breakdown, revenue calculation, and strategic recommendations to enhance profitability.
Data was downloaded through Kaggle
Contains the code and data files for a data science project using individual player statistics data from the 2023-2024 NCAA basketball season.
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."