openai-embeddings-2023

OpenAI Text Embeddings for User Classification in Social Networks

Setup

Virtual Environment

Create and/or activate virtual environment:

conda create -n openai-env python=3.10
conda activate openai-env

Install package dependencies:

pip install -r requirements.txt

OpenAI API

Obtain an OpenAI API Key (i.e. OPENAI_API_KEY). We initially fetched embeddings from the OpenAI API via the notebooks, but the service code has been re-implemented here afterwards, in case you want to experiment with obtaining your own embeddings.

Users Sample

Obtain a copy of the "botometer_sample_openai_tweet_embeddings_20230724.csv.gz" CSV file, and store it in the "data/text-embedding-ada-002" directory in this repo. This file was generated by the notebooks, and is ignored from version control because it contains user identifiers.

Cloud Storage

We are saving trained models to Google Cloud Storage. You will need to create a project on Google Cloud, and enable the Cloud Storage API as necessary. Then create a service account and download the service account JSON credentials file, and store it in the root directory, called "google-credentials.json". This file has been ignored from version control.

From the cloud storage console, create a new bucket, and note its name (i.e. BUCKET_NAME).

Environment Variables

Create a local ".env" file and add contents like the following:

# this is the ".env" file...

OPENAI_API_KEY="sk__________"

GOOGLE_APPLICATION_CREDENTIALS="/path/to/openai-embeddings-2023/google-credentials.json"
BUCKET_NAME="my-bucket"

DATASET_ADDRESS="my_project.my_dataset"

Usage

OpenAI Service

Fetch some example embeddings from OpenAI API:

python -m app.openai_service

Embeddings per User (v1)

Demonstrate ability to load the dataset:

python -m app.dataset

Perform machine learning and other analyses on the data:

OpenAI Embeddings:

Word2Vec Embeddings:

Embeddings per Tweet (v1)

OpenAI Embeddings:

Fetching Embeddings

Testing

pytest --disable-warnings

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
app		app
data		data
notebooks		notebooks
results		results
test		test
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openai-embeddings-2023

Setup

Virtual Environment

OpenAI API

Users Sample

Cloud Storage

Environment Variables

Usage

OpenAI Service

Embeddings per User (v1)

Embeddings per Tweet (v1)

Testing

About

Releases

Packages

Languages

s2t2/openai-embeddings-2023

Folders and files

Latest commit

History

Repository files navigation

openai-embeddings-2023

Setup

Virtual Environment

OpenAI API

Users Sample

Cloud Storage

Environment Variables

Usage

OpenAI Service

Embeddings per User (v1)

Embeddings per Tweet (v1)

Testing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages