-
Notifications
You must be signed in to change notification settings - Fork 270
Tools
Rocio Ng edited this page Jul 27, 2016
·
24 revisions
- R-Studio Just don't use R without this. Just don't..
-
Hadley Wickham Anything that this guy has made. Over the past 10 years, he's made a bunch of tools that have made R a much less clunky language.
- ggplot2 The best plotting.
-
ggvis An upcoming alternative to
ggplot2
; offers some nice features at the moment for web displays (including interactivity). -
dplyr An incredibly useful
data.frame
manipulation package. Supports all sorts of things like aggregation, grouping, and even lets you lazily evaluate manipulations of connections to SQL databases (or BQ!) - tidyr For making your data tidy. An extension of reshape2.
- httr Simple manipulation of HTTP.
- rvest Simple web scraping.
- bigrquery A decent interface to BigQuery.
- magrittr Understand this as soon as possible. It will make your life much easier.
-
pipeR A competing version of the
magrittr
package. Do the tutorial. -
rlist Like
dplyr
but for lists. -
data.table Offers an alternative to data.frames, is very fast and incorporates some of the features of
dplyr
in its DF manipulation syntax. Do the tutorial. - purrr Functional programming additions for R. Lets you do a lot of useful function composition/application easily.
- sparkTable Makes Tufte-style spark-* charts or tables. Compatible with shiny.
- ShinyJS Great for incorporating interactive javascript into Shiny apps and markdowns via R code.
- jupyter notebooks Interactive workbooks
- PyCharm Python IDE with integrated terminal and neat features such as smart autocomplete and SQL database interfaces
-
matplotlib Most commonly used library for data visualization and plotting
-
seaborn For creating 'prettier' data visualizations
-
scikit-learn Commonly used machine learning library
-
psycopg PostgreSQL adapter for Python. Easy to use and reliable
-
nltk Extensive library for doing natural language processing (NLP)
-
itertools Extremely useful library for faster/efficient looping in Python. Not the easiest to use but read this and give it a shot
- Data Bricks (https://databricks.com/try-databricks)
- Free Community Edition that lets you spin up a small Spark cluster from any browser!