-
Notifications
You must be signed in to change notification settings - Fork 270
Tools
Tyler Field edited this page Jul 30, 2016
·
24 revisions
- R-Studio Just don't use R without this. Just don't..
-
Hadley Wickham Anything that this guy has made. Over the past 10 years, he's made a bunch of tools that have made R a much less clunky language.
- ggplot2 The best plotting.
-
ggvis An upcoming alternative to
ggplot2
; offers some nice features at the moment for web displays (including interactivity). -
dplyr An incredibly useful
data.frame
manipulation package. Supports all sorts of things like aggregation, grouping, and even lets you lazily evaluate manipulations of connections to SQL databases (or BQ!) - tidyr For making your data tidy. An extension of reshape2.
- httr Simple manipulation of HTTP.
- rvest Simple web scraping.
- bigrquery A decent interface to BigQuery.
- magrittr Understand this as soon as possible. It will make your life much easier.
-
pipeR A competing version of the
magrittr
package. Do the tutorial. -
rlist Like
dplyr
but for lists. -
data.table Offers an alternative to data.frames, is very fast and incorporates some of the features of
dplyr
in its DF manipulation syntax. Do the tutorial. - purrr Functional programming additions for R. Lets you do a lot of useful function composition/application easily.
- sparkTable Makes Tufte-style spark-* charts or tables. Compatible with shiny.
- ShinyJS Great for incorporating interactive javascript into Shiny apps and markdowns via R code.
- caret Functional and easy package for prototyping and comparing different machine learning models. Streamlines, pre-processing, cross-validation, hyper-parameter tuning etc with minimal code
- jupyter notebooks Interactive workbooks
- PyCharm Python IDE with integrated terminal and neat features such as smart autocomplete and SQL database interfaces
-
matplotlib Most commonly used library for data visualization and plotting
-
seaborn For creating 'prettier' data visualizations
-
scikit-learn Commonly used machine learning library
-
psycopg PostgreSQL adapter for Python. Easy to use and reliable
-
nltk Extensive library for doing natural language processing (NLP). However to take advantage of different corpora they need to be downloaded separately using nltk.download()
-
itertools Extremely useful library for faster/efficient looping in Python. Not the easiest to use but read this and give it a shot
- [Data Bricks powered by Spark] (https://databricks.com/try-databricks)
- Free Community Edition that lets you spin up a small Spark cluster from any browser!
- GeoNames REST API that returns ZIP Codes, and other properties, from Lat/Lon points
- Data Science Toolkit Variety of tools including Street Address to Coordinates, Coordinates to Political Areas, Coordinates to Statistics, and several Text parsing/sentiment tools.
- FCC Census Block Conversions API for getting the census tract from a Lat/Lon
- SoQL Socrata, which hosts SF Open Data, has API access for every data set and a variety of SQL-like functions that make queries powerful
- CitySDK The Census Bureau has a SDK package and API. You'll have to sign up for a key, but it's free.