Hi, this is the installation guide for GT Big Data Club. It contains instructions on how to install everything that you need to start hacking with us. Check out the section that your interested in, grab a soda, and start installing!
This file comes with bootstrapping scripts, if you don't want to read through this documentation.
- Download the appropriate script as shown below.
Windows: Run scripts/windows.cmd
Linux: Run scripts/linux.sh
Mac: Run scripts/mac.sh
NOTE: The above scripts will download a package manager to your computer to simplify downloading and updating packages in the future.
Then, install the packages required by running this command:
conda env create -f environment.yml
Now, anytime you want to run any Big Data Club stuff, simply run
source activate big-data-club
and you will have the required packages in that shell!
There are certain tools and technologies that all parts of the team interact with. These are:
Package managers make it easier to download programs, handle updates, and set your PATH
variable. There are different package managers for different OS.
Windows: Chocolatey
Linux: apt-get
Mac: Homebrew
Using a package manager will make the following steps quite trivial, as there will be no need to open up your browser at all!
While others exist, the bootstrapping scripts use the package managers listed above. Feel free to use your favorite package manager!
A free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
You can download Git by going to this link.
If you would like a visual client for Git, GitHub offers a cross- platform app here.
An extremely popular NoSQL database that organises data as documents of key- value pairs, instead of using tables and rows.
The installer for MongoDB can also be obtained from the MongoDB website.
Also, optionally install RoboMongo, an admin tool for MongoDB.
Python is a popular, high- level programming language with a multitude of uses. Pip is an installation tool that makes installing Python libraries relatively painless.
You can download the latest version of Python here
Conda makes it easier to change between different versions of Python, and pre-bundles several scientific computing packages for Python
Flask is a Python microframework for writing web servers.
An open source, cross-platform runtime environment for server- side and networking applications. npm is a package manager that comes bundled with Node.js.
The installer for Node.js can be found here
NumPy is the fundamental package for scientific computing with Python.
SciPy extends NumPy to have more functionality.
Leading platform for building Python programs to work with human language data.
Leading Python library for data mining and data analysis.
Beautiful Soup is a Python library for parsing HTML.
Requests is a Python library for making HTTP requests.