Skip to content

simulatrex/llm-subpopulation-research

Repository files navigation

LLM Subpopulation Research Project

This project is dedicated to researching subpopulations within large language models (LLMs). The goal is to understand the nuances and variations in model responses when prompted with different inputs that may represent various subpopulations.

Using Google BigQuery

To analyze the data generated by the LLMs, we utilize Google BigQuery, a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data.

Getting Started with BigQuery

  1. Set up a Google Cloud account: If you do not have one, sign up for a new account at https://cloud.google.com/.

  2. Create a new project: Once you have a Google Cloud account, create a new project for your LLM research.

  3. Enable BigQuery API: Navigate to the API & Services dashboard and enable the BigQuery API for your project.

  4. Set up authentication: Create a service account and download the JSON key file. This will be used to authenticate your requests to BigQuery.

  5. Set up your environment: Place your service account JSON key file in a secure directory, then specify the path to this file in the GOOGLE_APPLICATION_CREDENTIALS variable within your .env file.

Setting up a Virtual Environment

To replicate the test environment, please follow these steps:

  1. Set up a virtual environment using either venv or conda.

    • For venv, run:
      python3 -m venv venv
      source venv/bin/activate
      
    • For conda, run:
      conda create --name myenv python=3.11
      conda activate myenv
      
  2. Install the required dependencies by running:

    pip install -r requirements.txt
    

Ensure you activate the virtual environment before running any code within this workspace.

About

LLM Subpopulation Research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published