Skip to content

Can LLMs create & understand User beliefs and Biases? Can they use them against the user to manipulate them?

Notifications You must be signed in to change notification settings

jithinAB/Belief-and-Biases-Bench

Repository files navigation

Nudge GenAI - Synthetic Conversation Generator

A robust system for generating synthetic psychological conversations using Large Language Models (LLMs) via LM Studio. This project creates realistic multi-turn conversations based on demographic profiles, beliefs, and cognitive biases.

🎯 Overview

This project generates synthetic conversations between personas and AI assistants, where each persona is defined by:

  • Geographic location
  • Demographics (age, gender, education, etc.)
  • Personal beliefs and values
  • Cognitive biases

The system processes CSV data containing persona profiles and uses LLMs to generate contextually appropriate conversations that reflect the persona's characteristics.

πŸš€ Features

  • Parallel Processing: Configurable concurrent request handling with rate limiting
  • Crash Recovery: Automatic resume from last checkpoint after interruptions
  • Progress Tracking: Detailed logging and real-time progress monitoring
  • Retry Logic: Exponential backoff for failed requests
  • Flexible Output: Individual JSON files per persona or consolidated output
  • Data Validation: Robust JSON parsing with error handling
  • Rate Limiting: Configurable requests per minute to prevent API overload

πŸ“‹ Prerequisites

  • Python 3.8+
  • LM Studio installed and running
  • A compatible LLM loaded in LM Studio (e.g., GPT-OSS-20B)

πŸ› οΈ Installation

  1. Clone the repository:
git clone https://github.com/jithinAB/nudge-genai.git
cd nudge-genai
  1. Set up virtual environment:
python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate
  1. Install dependencies:
pip install aiohttp aiofiles
  1. Set up LM Studio:
    • Download and install LM Studio
    • Load your preferred model (e.g., openai/gpt-oss-20b)
    • Start the local server (default: http://localhost:1234)

πŸ“ Project Structure

nudge-genai/
β”œβ”€β”€ data/
β”‚   └── data/
β”‚       └── scenario.csv          # Input CSV with persona profiles
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ lm_studio_processor.py    # Main processing script
β”‚   β”œβ”€β”€ test_lm_studio.py        # LM Studio connection test
β”‚   β”œβ”€β”€ test_simple.py           # Simple API test
β”‚   └── synthetic_data_output/   # Generated conversations
β”‚       β”œβ”€β”€ individual_results/  # Per-persona JSON files
β”‚       β”œβ”€β”€ consolidated_results.json
β”‚       β”œβ”€β”€ synthetic_conversations_final.json
β”‚       β”œβ”€β”€ processing_summary.json
β”‚       └── failed_rows.json
β”œβ”€β”€ .gitignore
└── README.md

πŸ”§ Configuration

Edit the configuration section in scripts/lm_studio_processor.py:

# API Configuration
LM_STUDIO_URL = "http://localhost:1234/v1/chat/completions"
MODEL_NAME = "openai/gpt-oss-20b"

# Processing configuration
MAX_CONCURRENT_REQUESTS = 1  # Number of parallel requests
REQUESTS_PER_MINUTE = 10     # Rate limit
MAX_RETRIES = 3              # Maximum retry attempts
REQUEST_TIMEOUT = 180        # Timeout in seconds
SAVE_INTERVAL = 5            # Save checkpoint every N rows

# Model parameters
TEMPERATURE = 0.7
MAX_TOKENS = 2000

πŸ“Š Input Data Format

The CSV file should contain the following columns:

  • place: Geographic location
  • demographics: Age, gender, education, occupation, etc.
  • beliefs: Personal beliefs and values
  • bias: Cognitive biases

Example:

place,demographics,beliefs,bias
New York,"35, Male, MBA, Marketing Manager","Values work-life balance, Believes in sustainable living","Confirmation bias, Anchoring bias"

πŸš€ Usage

1. Test LM Studio Connection

cd scripts
python test_lm_studio.py

2. Run the Main Processor

Start fresh processing:

python lm_studio_processor.py

Resume from checkpoint (after interruption):

python lm_studio_processor.py --resume

3. Monitor Progress

The script provides real-time progress updates:

[INFO] Processing row 10/100 (10.0%) | Row ID: User_10
[INFO] Successfully processed row 10 in 3.45s
[INFO] Progress: 10/100 (10.0%) | Success rate: 90.0%

πŸ“€ Output Format

Individual Result Files

Each persona generates a file in synthetic_data_output/individual_results/:

{
  "row_number": 1,
  "row_id": "User_01",
  "status": "success",
  "processing_time": 3.45,
  "timestamp": "2025-01-16T10:30:00",
  "input_data": {
    "place": "New York",
    "demographics": "35, Male, MBA",
    "beliefs": "Values work-life balance",
    "bias": "Confirmation bias"
  },
  "output_data": {
    "Conversations": {
      "career_advice": [
        {"role": "person", "message": "..."},
        {"role": "AI", "message": "..."}
      ]
    }
  }
}

Consolidated Output

synthetic_conversations_final.json contains all successful conversations in a single file.

Processing Summary

processing_summary.json provides statistics:

{
  "total_rows": 100,
  "successful": 95,
  "failed": 5,
  "success_rate": 95.0,
  "total_time": 450.5,
  "average_time_per_row": 4.5
}

πŸ” Troubleshooting

LM Studio Connection Issues

  • Ensure LM Studio is running and the server is started
  • Check the URL matches your LM Studio settings (default: http://localhost:1234)
  • Verify the model name matches the loaded model

Memory Issues

  • Reduce MAX_CONCURRENT_REQUESTS to 1
  • Decrease MAX_TOKENS if responses are too large
  • Process data in smaller batches

JSON Parsing Errors

  • Check debug files in synthetic_data_output/debug_*.txt
  • Review the prompt template to ensure it requests valid JSON
  • Increase MAX_TOKENS if responses are being truncated

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built with LM Studio for local LLM inference
  • Uses OpenAI-compatible API for maximum flexibility
  • Inspired by research in synthetic data generation for AI training

πŸ“§ Contact

For questions or support, please open an issue on GitHub or contact the maintainers.


Note: This tool is designed for research and development purposes. Ensure you comply with all applicable data protection and privacy regulations when generating synthetic data based on real demographic profiles.

About

Can LLMs create & understand User beliefs and Biases? Can they use them against the user to manipulate them?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •