Skip to content
Rocio Ng edited this page Jul 27, 2016 · 24 revisions

Data Science Tools

R

IDEs

  • R-Studio Just don't use R without this. Just don't..

Packages

  • Hadley Wickham Anything that this guy has made. Over the past 10 years, he's made a bunch of tools that have made R a much less clunky language.
    • ggplot2 The best plotting.
    • ggvis An upcoming alternative to ggplot2; offers some nice features at the moment for web displays (including interactivity).
    • dplyr An incredibly useful data.frame manipulation package. Supports all sorts of things like aggregation, grouping, and even lets you lazily evaluate manipulations of connections to SQL databases (or BQ!)
    • tidyr For making your data tidy. An extension of reshape2.
    • httr Simple manipulation of HTTP.
    • rvest Simple web scraping.
  • bigrquery A decent interface to BigQuery.
  • magrittr Understand this as soon as possible. It will make your life much easier.
  • pipeR A competing version of the magrittr package. Do the tutorial.
  • rlist Like dplyr but for lists.
  • data.table Offers an alternative to data.frames, is very fast and incorporates some of the features of dplyr in its DF manipulation syntax. Do the tutorial.
  • purrr Functional programming additions for R. Lets you do a lot of useful function composition/application easily.
  • sparkTable Makes Tufte-style spark-* charts or tables. Compatible with shiny.
  • ShinyJS Great for incorporating interactive javascript into Shiny apps and markdowns via R code.

Python

IDEs

  • jupyter notebooks Interactive workbooks
  • PyCharm Python IDE with integrated terminal and neat features such as smart autocomplete and SQL database interfaces

Packages

  • pandas Data wrangling/manipulation
  • numpy and scipy Data analysis and statistics tools
  • matplotlib Most commonly used library for data visualization and plotting

  • seaborn For creating 'prettier' data visualizations

  • scikit-learn Commonly used machine learning library

  • psycopg PostgreSQL adapter for Python. Easy to use and reliable

  • nltk Extensive library for doing natural language processing (NLP)

  • itertools Extremely useful library for faster/efficient looping in Python. Not the easiest to use but read this and give it a shot

Spark