Skip to content

developersdigest/ai-devices

Repository files navigation

AI Device Template

Now supports gpt-4o and gemini-1.5-flash-latest for Vision Inference

YouTube Tutorial

This project is an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. This was inspired by the recent trend of AI Devices such as the Humane AI Pin and the Rabbit R1.

Features

  • Voice input and transcription: Using Whisper models from Groq or OpenAI
  • Text-to-speech output: Using OpenAI's TTS models
  • Image processing: Using OpenAI's GPT-4 Vision or Fal.ai's Llava-Next models
  • Function calling and conditionally rendered UI components: Using OpenAI's GPT-3.5-Turbo model
  • Customizable UI settings: Includes response times, settings toggle, text-to-speech toggle, internet results toggle, and photo upload toggle
  • (Optional) Rate limiting: Using Upstash
  • (Optional) Tracing: With Langchain's LangSmith for function execution

Setup

1. Clone the repository

git clone https://github.com/developersdigest/ai-devices.git

2. Install dependencies

npm install 
# or
bun install

3. Add API Keys

To use this AI-powered voice assistant, you need to provide the necessary API keys for the selected AI models and services.

Required for core functionality

  • Groq API Key For Llama + Whisper
  • OpenAI API Key for TTS and Vision + Whisper
  • Serper API Key for Internet Results

Optional for advanced configuration

  • Langchain Tracing for function execution tracing
  • Upstash Redis for IP-based rate limiting
  • Spotify for Spotify API interactions
  • Fal.AI (Lllava Image Model) Alternative vision model to GPT-4-Vision

Replace 'API_KEY_GOES_HERE' with your actual API keys for each service.

4. Start the development server

npm run dev
# or
bun dev

Access the application at http://localhost:3000 or through the provided URL.

5. Deployment

Deploy with Vercel

Configuration

Modify app/config.tsx to adjust settings and configurations for the AI-powered voice assistant. Here’s an overview of the available options:

export const config = {
    // Inference settings
    inferenceModelProvider: 'groq', // 'groq' or 'openai'
    inferenceModel: 'llama3-8b-8192', // Groq: 'llama3-70b-8192' or 'llama3-8b-8192'.. OpenAI: 'gpt-4-turbo etc

    // BELOW OPTIONAL are some options for the app to use
    
    // Whisper settings
    whisperModelProvider: 'openai', // 'groq' or 'openai'
    whisperModel: 'whisper-1', // Groq: 'whisper-large-v3' OpenAI: 'whisper-1'

    // TTS settings
    ttsModelProvider: 'openai', // only openai supported for now...
    ttsModel: 'tts-1', // only openai supported for now...s
    ttsvoice: 'alloy', // only openai supported for now... [alloy, echo, fable, onyx, nova, and shimmer]

    // OPTIONAL:Vision settings 
    visionModelProvider: 'google', // 'openai' or 'fal.ai' or 'google'
    visionModel: 'gemini-1.5-flash-latest', // OpenAI: 'gpt-4o' or  Fal.ai: 'llava-next' or  Google: 'gemini-1.5-flash-latest'

    // Function calling + conditionally rendered UI 
    functionCallingModelProvider: 'openai', // 'openai' current only
    functionCallingModel: 'gpt-3.5-turbo', // OpenAI: 'gpt-3-5-turbo'

    // UI settings 
    enableResponseTimes: false, // Display response times for each message
    enableSettingsUIToggle: true, // Display the settings UI toggle
    enableTextToSpeechUIToggle: true, // Display the text to speech UI toggle
    enableInternetResultsUIToggle: true, // Display the internet results UI toggle
    enableUsePhotUIToggle: true, // Display the use photo UI toggle
    enabledRabbitMode: true, // Enable the rabbit mode UI toggle
    enabledLudicrousMode: true, // Enable the ludicrous mode UI toggle
    useAttributionComponent: true, // Use the attribution component to display the attribution of the AI models/services used

    // Rate limiting settings
    useRateLimiting: false, // Use Upstash rate limiting to limit the number of requests per user

    // Tracing with Langchain
    useLangSmith: true, // Use LangSmith by Langchain to trace the execution of the functions in the config.tsx set to true to use.
};

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

I'm the developer behind Developers Digest. If you find my work helpful or enjoy what I do, consider supporting me. Here are a few ways you can do that: