Lake Demos

VantageCloud Lake Demos Public Repository.

Purpose is to store all the public lake demos here in a single project where the community can collaborate.

Available Demos List

1. Environment Setup Automation (Demo_Environment_Setup_Automation.ipynb)

Python Notebook

Three files required:

Environment variables file vars.json
0_Demo_Environment_Setup_Automation.ipynb
1_Load_Base_Demo_Data.ipynb

Alternate - use Apache Airflow

Upload Demo_Setup_Airflow_Python.py to Airflow
Edit vars.json and upload as "Variables".
Execute the DAG
Run 1_Load_Base_Demo_Data.ipynb

Environment Setup Checklist

To initiate the configuration of the environment used for these demos, perform the steps in "Environment Setup Automation"; either by running the Jupyter Notebook or the Airflow DAG. Prior to running these scripts, perform the following:

Edit vars.json to reflect the target environment
Validate other environment and hierarchy settings in vars.json
Clusters are set up to be active during nominal business hours USA TIME. Adjust as necessary in the notebook or DAG
If using Airflow, upload the new vars.json to Variables in Airflow Admin Screen
When the setup is complete, use the Admin notebook to check cluster status, suspend/resume as needed

This notebook will create the Lake environment hierarchy design;

Takes some environmental declarations (users, databases, etc.) from the json file
Uses US BUSINESS HOURS for Clusters active time. Adjust if needed.
GRANTs to retail_sample_data for the DEMO_AUTH_NOS to all objects
Creates a Repositories.PubAuth Authorization Object for accessing open object stores.
Creates two databases; "demo" and "demo_ofs" each with default NDS and OFS storage respectively.

Per the design, SYSDBA is the account DBA, CGADMIN is Compute Group Administrator, users are in the Business Users Profile.

2. Base Data Loading (1_Demo_Setup_Base_Data.ipynb)

Python Notebook
Purpose is to load minimal data to the local Lake system to run the base demo notebooks

Log in as SYSDBA
Loads two dimension tables to BFS storage from S3
- demo.Customer_BFS
- demo.Accounts_Mapping_BFS
Loads one fact table to OFS Storage from S3
- demo_OFS.Txn_History

3. Environment Administration (Demo_Admin.ipynb)

Vantage SQL Kernel

Log in as CGADMIN/password
Compute Group Status
RESUME/SUSPEND/DROP
DBC login in case one needs DBC

4. Data Engineering (Data_Engineering_Exploration.ipynb)

Vantage SQL Kernel

Create OFS Table from S3 "CashApp" transactions
Create Foreign Table from S3 "Banking History"
Review Tables - Dimensions in BFS, CashApp in OFS, Banking History in S3
Execute Joins and Analytics:
- Identify Customers who have experienced Fraud
- Show the victim's full behavioral path through their Banking relationship
Execute Joins across the Query Fabric (QueryGrid)

5. Open Analytics Framework (Data_Science_OAF.ipynb)

Python Notebook (python 3.8)

Credentials and UES URI inherited from vars.json
Create custom container - install libraries and versions
Upload model and scoring script
Execute Feature Engineering - pass it to scoring.
Evaluate Model

5. Data Science Process - Python (Data_Science_OAF.ipynb)

Appendix Section - Create the model

OneHotEncode
Test/Train Split
Train Model
Test Model
Confusion Matrix

VantageCloud Lake Fundamentals

Notebooks illustrating the feature/function basics

See README for more details

1. Native Object Store

Fundamentals/Native-Object-Store/NOS_Fundamentals_SQL.ipynb

Demos in UseCases Folder

Each Use Case has its own data loading notebook. Typically, the data is loaded from an S3 bucket; bucket name and any credentials are inherited from vars.json file.

See README for more details

4. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF

UseCases/Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb

5. System Scaling and Monitoring

UseCases/Scaling/Demo 1 - Generate Workload.ipynb
UseCases/Scaling/Demo 2 - Real-Time Monitoring.ipynb
UseCases/Scaling/Demo 3 - System Monitoring Queries.ipynb

6. Proximity to Climate Risk/Geospatial Analysis

UseCases/Proximity-To-Climate-Risk/Proximity_To_Climate_Risk.ipynb

7. Vector Embeddings for Customer Segmentation

UseCases/Vector-Embeddings-Segmentation/Segmentation_With_Vector_Embedding.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Fundamentals		Fundamentals
UseCases		UseCases
images		images
.gitignore		.gitignore
0_Demo_Environment_Setup.ipynb		0_Demo_Environment_Setup.ipynb
1_Load_Base_Demo_Data.ipynb		1_Load_Base_Demo_Data.ipynb
Data_Engineering_Exploration.ipynb		Data_Engineering_Exploration.ipynb
Data_Science_OAF.ipynb		Data_Science_OAF.ipynb
Demo_Admin.ipynb		Demo_Admin.ipynb
Demo_Setup_Airflow_Python.py		Demo_Setup_Airflow_Python.py
Demo_XGB_Scoring.py		Demo_XGB_Scoring.py
LICENSE		LICENSE
README.md		README.md
vars.json		vars.json
xgb_model		xgb_model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lake Demos

1. Environment Setup Automation (Demo_Environment_Setup_Automation.ipynb)

Environment Setup Checklist

This notebook will create the Lake environment hierarchy design;

2. Base Data Loading (1_Demo_Setup_Base_Data.ipynb)

3. Environment Administration (Demo_Admin.ipynb)

4. Data Engineering (Data_Engineering_Exploration.ipynb)

5. Open Analytics Framework (Data_Science_OAF.ipynb)

5. Data Science Process - Python (Data_Science_OAF.ipynb)

1. Native Object Store

1. Native KMeans Clustering

2. Native GLM Numeric Regression

3. Sentiment Analysis using Native functions

4. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF

5. System Scaling and Monitoring

6. Proximity to Climate Risk/Geospatial Analysis

7. Vector Embeddings for Customer Segmentation

About

Releases

Packages

Contributors 4

Languages

License

Teradata/lake-demos

Folders and files

Latest commit

History

Repository files navigation

Lake Demos

1. Environment Setup Automation (Demo_Environment_Setup_Automation.ipynb)

Environment Setup Checklist

This notebook will create the Lake environment hierarchy design;

2. Base Data Loading (1_Demo_Setup_Base_Data.ipynb)

3. Environment Administration (Demo_Admin.ipynb)

4. Data Engineering (Data_Engineering_Exploration.ipynb)

5. Open Analytics Framework (Data_Science_OAF.ipynb)

5. Data Science Process - Python (Data_Science_OAF.ipynb)

1. Native Object Store

1. Native KMeans Clustering

2. Native GLM Numeric Regression

3. Sentiment Analysis using Native functions

4. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF

5. System Scaling and Monitoring

6. Proximity to Climate Risk/Geospatial Analysis

7. Vector Embeddings for Customer Segmentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages