Creating a virtual environment for this repo, as well as downloading the initial package requirements can be done via the following code:
Next, we need to initialize the database. Run the following to initialize the database:
psql postgres
Once you are in the SQL shell, run
CREATE DATABASE airflow_works;
Run \l
to check that the database is created.
Next, run this whenever you are in this repo to reset the Airflow constants:
As a validation setp, run the terminal window run echo $AIRFLOW_WORKS_DBURL
to see if the Airflow config is linked to the correct database.
The name of the database should be equal to the name after the postgres://localhost:port_number/
to see if you can psql into the database. If you can, \q
out from the
sql shell and you are ready to roll.
- Follow through on the installation requirements set above.
- After the database is correctly setup and linked to the Airflow via a config,
open a terminal window with the correct pyenv, run
airflow webserver -p 8080
- Open another terminal window with correct the correct pyenv (will be automated), run
airflow scheduler
- Follow the instructions here:
- Creating a setup for a sample DAG (follow through all the instructions to install pre-req packages)
- Create a PostgreSQL databse (follow steps here) (in my case, I called it
) - Create a new user with a new password
- Test run the DAG using localhost:8080, and create a new connection for the database created
- Create tasks in a sample DAG
- Created a game (rock-paper-scissors) that spit out the results
- Run the game on demand and store the results with timestamp in the database using Airflow
- (WIP) Create unit tests for tasks in a sample DAG
- Unit-testing the functionality of the rock-paper-scissors game
- Validation test framework for result logs coming out from the game
- Create API connection for external data crawler
- Read for starters on Socrata API.
- Setup API for data from (relevant to
- Read up Definition of ETL
Vineet Goel (Robinhood): (Why Robinhood Uses Airflow)[]
()Useful Quora for .bashrc/.bash_profile)[]