Starbucks Twitter Sentiment Analysis

Technologies used: Apache Kafka, Spark Structured Streaming, Confluent Cloud, Databricks, Delta Lake, Spark NLP

All details of the project is described in HERE.

1. Aim

The aim of the Starbucks Twitter Sentimental Analysis project is to build end-to-end twitter data streaming pipeline to analyze brand sentiment analysis.

2. Environment Setup

Set up the Virtual Environment

pip install virtualenv
virtualenv --version # test your installation 
virtualenv ccloud-venv

Step 1. Twitter API Credentials

As we performed in the previous post, we need to get Twitter API Credentials. After getting it, we save these credential information in .env file. Make sure to include .env file in .gitignore to be ignored in the future.

# .env
CONSUMER_KEY = "<api key>"
CONSUMER_SECRET = "<api secret>"
ACCESS_TOKEN_KEY = "<access key>"
ACCESS_TOKEN_SECRET = "<access secret>"

Step 2. Confluent Cloud

Confluent Cloud is a resilient, scalable streaming data service based on Apache Kafka®, delivered as a fully managed service - Confluent Cloud. It offers users to manage cluster resources easily.

2-1. Create a Confluent Cloud account and Kafka cluster

First, create a free Confluent Cloud account and create a kafka cluster in Confluent Cloud. I created a basic cluster which supports single zone availability with aws cloud provider.

2-2. Create a Kafka Topic named `tweet_data` with 2 partitions.

From the navigation menu, click Topics, and in the Topics page, click Create topic. I set topic name as tweet_data with 2 partitions, the topic created on the Kafka cluster will be available for use by producers and consumers.

Step 3. Confluent Cloud API credentials.

API keys

From the navigation menu, click API keys under Data Integration. If there is no available API Keys, click add key to get a new API keys (API_KEY, API_SECRET) and make sure to save it somewhere safe.

HOST: Bootstrap server

From the navigation menu, click Cluster settings under Cluster Overview. You can find Identification block which contains the information of Bootstrap server. Make sure to save it somewhere safe. It should be similar to pkc-w12qj.ap-southeast-1.aws.confluent.cloud:9092

HOST = pkc-w12qj.ap-southeast-1.aws.confluent.cloud

Save those at `$HOME/.confluent/python.config`

vi $HOME/.confluent/python.config

Press i and copy&paste the file below !

#kafka
bootstrap.servers={HOST}:9092 
security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username={API_KEY}
sasl.password={API_SECRET}

Then, replace HOST, API_KEY, API_SECRET with the values from Step 3. Press :wq to save the file.

Step 4. Create a Databricks Cluster

Check HERE FOR the procedure of creating a Databricks Cluster

Step 5. Some modifications are needed for twitter data ingestion

# Dockerfile

FROM python:3.7-slim

COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -U -r /tmp/requirements.txt

COPY producer/ /producer

CMD [ "python3", "producer/producer.py", 
  "-f", "/root/.confluent/librdkafka.config", 
  "-t", "<your-kafka-topic-name>" ]

Build and run the Docker Container

# cd <your-project_folder> 
# source ./ccloud-venv/bin/activate

bash run.sh

Final Sentimental Analysis

Click here to check the presentation file

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
notebooks		notebooks
producer		producer
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Starbucks sentiment analysis.pdf		Starbucks sentiment analysis.pdf
Tiwtter_Sentiment_Architecture.png		Tiwtter_Sentiment_Architecture.png
head_image.png		head_image.png
project_overview.png		project_overview.png
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks Twitter Sentiment Analysis

1. Aim

2. Environment Setup

Step 1. Twitter API Credentials

Step 2. Confluent Cloud

2-1. Create a Confluent Cloud account and Kafka cluster

2-2. Create a Kafka Topic named `tweet_data` with 2 partitions.

Step 3. Confluent Cloud API credentials.

API keys

HOST: Bootstrap server

Save those at `$HOME/.confluent/python.config`

Step 4. Create a Databricks Cluster

Step 5. Some modifications are needed for twitter data ingestion

Build and run the Docker Container

Final Sentimental Analysis

About

Releases

Packages

Languages

youheekil/twitter-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Starbucks Twitter Sentiment Analysis

1. Aim

2. Environment Setup

Step 1. Twitter API Credentials

Step 2. Confluent Cloud

2-1. Create a Confluent Cloud account and Kafka cluster

2-2. Create a Kafka Topic named tweet_data with 2 partitions.

Step 3. Confluent Cloud API credentials.

API keys

HOST: Bootstrap server

Save those at $HOME/.confluent/python.config

Step 4. Create a Databricks Cluster

Step 5. Some modifications are needed for twitter data ingestion

Build and run the Docker Container

Final Sentimental Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

2-2. Create a Kafka Topic named `tweet_data` with 2 partitions.

Save those at `$HOME/.confluent/python.config`

Packages