Skip to content

The same data cleaning and wrangling task is executed in both Python and R to demonstrate the equivalent code structures (Data: public-use PIAAC)

Notifications You must be signed in to change notification settings

Gulsah-G/dataprep-py-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Data pre-processing: Python vs R dplyr

This repository contains both a Python notebook and an R script for the same data cleaning and wrangling task to demonstrate the equivalent code structures in these two languages. Pre-processing task includes but not limited to:

  • Reading in .sav data files
  • Dealing with labelled data and value labels
  • Basic frequency tables
  • Filtering by group
  • Removing missing data
  • Creating new variables or recoding them into the same ones
  • Calculating group-centered/scaled variables
  • Removing outliers based on within-group quartiles
  • Replacing missing values with group means
  • Exporting data into csv

Data used: The U.S. public-use PIAAC data (2012-2014) (https://nces.ed.gov/surveys/piaac/datafiles.asp)

About

The same data cleaning and wrangling task is executed in both Python and R to demonstrate the equivalent code structures (Data: public-use PIAAC)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published