Data wrangling in Python
-
Updated
Apr 21, 2017 - Jupyter Notebook
Data wrangling in Python
Converting and integrating data from multiple sources is often tricky business. Luckily there are some great tools available that make this a breeze. I use a genetic annotation file (Brachypodium) and incorporate gene ontology definitions. This Uses dplyr and tidyr to do the data wrangling.
You can find the dataset in kaggle
The package reaches out to scientists that seek to estimate MOI and lineage frequencies at molecular markers using the maximum-likelihood framework described in https://doi.org/10.1371/journal.pone.0261889. Users can import data from Excel files in various formats, and perform maximum-likeli
Aggregate data in R using simple SQL commands
This is an exercise on the use of python for data wrangling based on the book "Python for Data Analysis" by Wes McKinney
I finished the Woz U's Data Science program in March 2022. This is the code and the projects that I turned in during my student experience.
THIS repo contains projects done under Udemy Boot Camp on Data Science
This repository provide an overview of the data wrangling process used for the WeRateDogs Twitter account dataset. The data wrangling process included data gathering, assessment, and cleaning to ensure the dataset was free of quality and tidiness issues.
Data analysis to gain insight into the sales data of Walmart to understand the different factors that affect sales of the different branches.
SQL & Tableau - Portfolio Project
A analysis of the gender pay data across Scottish companies
This project utilizes R to preprocess Spotify's "Unpopular Songs" and "Genre of Artists" datasets from Kaggle. Following tidy data principles, it handles duplicates, transforms variables, scans for outliers, and normalizes data. The resulting clean dataset is ready for statistical analysis, ensuring accurate and ethical data practices.
Data Wrangling com Python para e-Commerce
Gather data from various sources(csv, web scrape, json) and wrangle the data. Analysis and visualization of the twitter dog rating
Data Wrangling with MongoDB class code
This repo contains the code to download data and then extract it, if needed, and store it in a pickle file.
This analysis examines a dataset of 10,000 movies from a movie database, revealing insights and trends in the industry. Notably, drama is the most popular genre, and factors like budget and popularity impact revenue. However, limited data, replaced null values, outliers, and correlation-causation considerations call for cautious interpretation.
Add a description, image, and links to the datawrangling topic page so that developers can more easily learn about it.
To associate your repository with the datawrangling topic, visit your repo's landing page and select "manage topics."