Skip to content
/ data-pipeline Public template

Build a data pipeline using Google BigQuery, dbt, Google Sheets, and Supermetrics. It helps you create a monthly reporting toolkit that pulls in data from a variety of marketing channels.

Notifications You must be signed in to change notification settings

noahlearner/data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt-data-pipeline

Setup Steps

Google Sheets Setup

  1. Copy this Google Sheet
  • Set up Supermetrics queries to fill in each channel
  • Don't touch anything that is dark grey
  1. Copy the url of the Sheet as you'll need it in the bigQuery setup
  2. Make a copy of the the Data Pipeline Reporting Client List; This will serve as the file where you configure your clients.
  • Client - Add the Client Name
  • DMAM - is the account Manager. Could also be your SEO or any other role. You can change the heading but you'll have to chane the field names downstream in your SQL queries.
  • PPC - name of the PPc person on your account
  • MPA - name of the digital marketing assistant on the account
  • Content - Set the name of the content person on the account
  • Logo URL - please enter
  • Client Summary Sheet URL - add the URL of each client's Sheet
  • Client Dashboard URL - add url of their data studio report.
  1. Make a copy of this Google sheet
  • GDS Date - Enter date for each month on each row for each Client in yyyy-mm-dd format.
  • Client - Enter Client Name (must match the name in client list sheet. You should create data validation in this column to refer to a unique list of names from the client list file you created before to limit opportuinities of data entry error.
  • budget - enter how much the client pays you to manage each month. in this format: 1111.00 (don't use curreny symbols).

**** THis is a work in progress and more on the sheets setup to come

Google Cloud Account

  1. Sign Up for a Google Cloud Free Trial Account
  2. Enable the BigQuery API here

Create your BigQuery Project courtesy of dbt's documentation

  1. Go to the BigQuery Console — if you don't have a Google Cloud Platform account you will be asked to create one.
  2. Create a new project for this tutorial — if you've just created a BigQuery account, you'll be prompted to create a new project straight away. If you already have an existing you can select the project drop down in the header bar, and create a new project from there.

BigQuery Credentials for dbt

  1. Go to the BigQuery credential wizard. Ensure that your new project is selected in the header bar.
  2. Generate credentials with the following options:
  • Which API are you using? BigQuery API
  • Are you planning to use this API with App Engine or Compute Engine? No
  • Service account name: dbt-user
  • Role: BigQuery User
  • Key type: JSON
  • Download the JSON file and save it in an easy-to-remember spot, with a clear filename (e.g. dbt-user-creds.json)

BigQuery Setup

  1. Buid new Dataset in BigQuery to match the client name with multiple words separated by underscores.
  2. Build external tables for each of the marketing channels with following process:
  • Click on Client's Dataset
  • Choose Create a Table
  • Table Settings:
    • Create table from Drive
    • Put Sheet URL in Select Drive URI
    • Set File Format to Google Sheets
    • Confirm Project name and Dataset name are correct
    • Set table name
    • Check autodetect schema and input parameters
    • Set "Header rows to skip" to 1.
    • Click Create Table
  1. Tables include:
  • GA_in
  • GA_Goals_in
  • GMB_in
  • GoogleAds_in
  • Mailchimp_in
  • Management_in
  • Microsoft_in
  • Reporting_Client_List_in

BigQuery Gotchas

  • If the created table doesn't have the correct header names from your Google Sheet, no big deal. Just rebuild the table.
  • Remember to check autodetect schema and input parameters
  • Remember to Set "Header rows to skip" to 1.

Github Setup

  1. Clone this repo.
  2. Name the new Repo "data-pipeline- to match the Client name
  3. Edit the dbt_project.yml.
    • lines 6 (change name to client Name),
    • Line 33 Change from CAM_Solar to the client name with underscores inbetween words.
    • Line 37 Change CAM_Solar to to match the Dataset name found in BigQuery
    • Line 38 Change from CAM Solar to match the real Client Name
  4. Use this repo as the source for new dbt Project.

Set Up dbt Account + Project courtesy of dbt

  1. Create a dbt Cloud account here. If your organization already has a dbt Cloud account, ask an admin to add you as a Developer.
  2. Click the hamburger menu, then Account Settings, then Projects.
  3. Click "New Project".
  4. CLick "Begin" on next screen.
  5. Name the project "pipeline", and click "Continue".
  6. Click "BigQuery" on Next Screen.
  7. Click "Upload a Service Account JSON file" and paste in the credentials file from the BigQuery Credentials for dbt instructions above.
  8. scroll down to the "Development Credentials and rename the DATASET to match what was entered in line 37 of the Github Setup Section.
  9. Scroll Up and Click "Test".
  10. After you see "Connection Test Succeeded" Click "Continue".
  11. On Next Screen Click "Github" Button.
  12. You'll then be prompted to log into Github or you'll see your repos listed below in a table.
  13. Click on the Repo you built in the Github Setup.
  14. After the success message, Click the "Continue" buttonin top right of screen.
  15. On next screen Click "Start Developing".

Learn all about dbt Projects here

Running dbt

Follow dbt's documentation here

Scheduling dbt Runs

dbt's docuumentation Here

About

Build a data pipeline using Google BigQuery, dbt, Google Sheets, and Supermetrics. It helps you create a monthly reporting toolkit that pulls in data from a variety of marketing channels.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages