-
-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Overview
I was checking out the openai_llm code where i found the models supported for json_format to be the older models and that led to me testing out lot of things on the file to see that the Bolna codebase currently contains numerous hardcoded string literals and magic values that make the code harder to maintain, more error-prone, and difficult to extend. This proposal outlines a comprehensive refactoring to replace these with proper enums and constants.
Why This Matters
- ✅ Type Safety: Enums provide compile-time checking and IDE autocompletion
- ✅ Maintainability: Centralized constants make changes easier
- ✅ Documentation: Enums serve as living documentation of valid values
- ✅ Error Prevention: Reduces typos and invalid value usage
- ✅ Extensibility: Easy to add new providers/formats/types
Identified Hardcoded Values
🔴 HIGH PRIORITY - Core Infrastructure
1. Provider Names
Current Issues:
File: bolna/providers.py (Lines 7-50)
SUPPORTED_SYNTHESIZER_MODELS = {
'polly': PollySynthesizer,
'elevenlabs': ElevenlabsSynthesizer,
'openai': OPENAISynthesizer,
'deepgram': DeepgramSynthesizer,
'azuretts': AzureSynthesizer,
'cartesia': CartesiaSynthesizer,
'smallest': SmallestSynthesizer,
'sarvam': SarvamSynthesizer,
'rime': RimeSynthesizer
}
SUPPORTED_LLM_PROVIDERS = {
'openai': OpenAiLLM,
'cohere': LiteLLM,
'ollama': LiteLLM,
'deepinfra': LiteLLM,
'together': LiteLLM,
'fireworks': LiteLLM,
'azure-openai': LiteLLM,
'perplexity': LiteLLM,
'vllm': LiteLLM,
'anyscale': LiteLLM,
'custom': OpenAiLLM,
'ola': OpenAiLLM,
'groq': LiteLLM,
'anthropic': LiteLLM,
'deepseek': LiteLLM,
'openrouter': LiteLLM,
'azure': LiteLLM
}File: bolna/models.py (Line 102)
@field_validator("provider")
def validate_model(cls, value):
return validate_attribute(value, ["polly", "elevenlabs", "openai", "deepgram", "azuretts", "cartesia", "smallest", "sarvam", "rime"])File: bolna/models.py (Line 89)
@field_validator("provider")
def validate_model(cls, value):
return validate_attribute(value, list(SUPPORTED_TRANSCRIBER_PROVIDERS.keys()))Proposed Solution:
from enum import Enum
class SynthesizerProvider(str, Enum):
POLLY = "polly"
ELEVENLABS = "elevenlabs"
OPENAI = "openai"
DEEPGRAM = "deepgram"
AZURE_TTS = "azuretts"
CARTESIA = "cartesia"
SMALLEST = "smallest"
SARVAM = "sarvam"
RIME = "rime"
class TranscriberProvider(str, Enum):
DEEPGRAM = "deepgram"
WHISPER = "whisper"
AZURE = "azure"
ASSEMBLY_AI = "assemblyai"
class LLMProvider(str, Enum):
OPENAI = "openai"
COHERE = "cohere"
OLLAMA = "ollama"
ANTHROPIC = "anthropic"
GROQ = "groq"
# ... etcAdditional Files Affected:
bolna/synthesizer/polly_synthesizer.pybolna/synthesizer/elevenlabs_synthesizer.pybolna/synthesizer/openai_synthesizer.pybolna/synthesizer/deepgram_synthesizer.pybolna/synthesizer/azure_synthesizer.pybolna/synthesizer/cartesia_synthesizer.pybolna/synthesizer/smallest_synthesizer.pybolna/synthesizer/sarvam_synthesizer.pybolna/synthesizer/rime_synthesizer.pybolna/transcriber/deepgram_transcriber.pybolna/transcriber/whisper_transcriber.pybolna/transcriber/azure_transcriber.pybolna/transcriber/assemblyai_transcriber.py
2. Audio/Media Formats
Current Issues:
File: bolna/models.py (Line 79)
class Transcriber(BaseModel):
provider: str
encoding: Optional[str] = "linear16"
language: Optional[str] = "en"
model: Optional[str] = None
stream: bool = TrueFile: bolna/models.py (Line 97)
class Synthesizer(BaseModel):
provider: str
provider_config: Union[PollyConfig, ElevenLabsConfig, AzureConfig, RimeConfig, SmallestConfig, SarvamConfig, CartesiaConfig, DeepgramConfig, OpenAIConfig] = Field(union_mode='smart')
stream: bool = False
buffer_size: Optional[int] = 40 # 40 characters in a buffer
audio_format: Optional[str] = "pcm"
caching: Optional[bool] = TrueFile: bolna/models.py (Line 108)
class IOModel(BaseModel):
provider: str
format: Optional[str] = "wav"File: bolna/assistant.py (Lines 17, 22)
tools_config_args['input'] = {
"format": "wav",
"provider": "default"
}
tools_config_args['output'] = {
"format": "wav",
"provider": "default"
}File: bolna/synthesizer/openai_synthesizer.py (Line 17)
def get_format(self, format):
return "mp3"Proposed Solution:
class AudioFormat(str, Enum):
WAV = "wav"
PCM = "pcm"
MP3 = "mp3"
FLAC = "flac"
class AudioEncoding(str, Enum):
LINEAR16 = "linear16"
MULAW = "mulaw"
ALAW = "alaw"
class ResponseFormat(str, Enum):
TEXT = "text"
JSON_OBJECT = "json_object"Additional Files Affected:
bolna/synthesizer/base_synthesizer.pybolna/transcriber/base_transcriber.pybolna/output_handlers/telephony_providers/twilio.pybolna/output_handlers/telephony.pybolna/helpers/utils.pybolna/agent_manager/task_manager.py
3. Task Types
Current Issues:
File: bolna/models.py (Line 334)
class Task(BaseModel):
tools_config: ToolsConfig
toolchain: ToolsChainModel
task_type: Optional[str] = "conversation" # extraction, summarization, notification
task_config: ConversationConfig = dict()File: bolna/agent_manager/task_manager.py (Lines 726-737)
def __setup_tasks(self, llm=None, agent_type=None, assistant_config=None):
if self.task_config["task_type"] == "conversation" and not self.__is_multiagent():
self.tools["llm_agent"] = self.__get_agent_object(llm, agent_type, assistant_config)
elif self.__is_multiagent():
return self.__get_agent_object(llm, agent_type, assistant_config)
elif self.task_config["task_type"] == "extraction":
logger.info("Setting up extraction agent")
self.tools["llm_agent"] = ExtractionContextualAgent(llm, prompt=self.system_prompt)
self.extracted_data = None
elif self.task_config["task_type"] == "summarization":
logger.info("Setting up summarization agent")
self.tools["llm_agent"] = SummarizationContextualAgent(llm, prompt=self.system_prompt)
self.summarized_data = NoneFile: bolna/agent_manager/task_manager.py (Line 755)
async def load_prompt(self, assistant_name, task_id, local, **kwargs):
if self.task_config["task_type"] == "webhook":
returnFile: local_setup/quickstart_server.py (Line 66)
if task['task_type'] == "extraction":
extraction_prompt_llm = os.getenv("EXTRACTION_PROMPT_GENERATION_MODEL")
extraction_prompt_generation_llm = LiteLLM(model=extraction_prompt_llm, max_tokens=2000)Proposed Solution:
class TaskType(str, Enum):
CONVERSATION = "conversation"
EXTRACTION = "extraction"
SUMMARIZATION = "summarization"
NOTIFICATION = "notification"
WEBHOOK = "webhook"Files Affected:
bolna/models.py(Line 334)bolna/agent_manager/task_manager.py(Lines 726, 730, 734, 755)local_setup/quickstart_server.py(Line 66)
🟡 MEDIUM PRIORITY - Feature Enhancement
4. Pipeline Components
Current Issues:
File: bolna/assistant.py (Lines 26-36)
if transcriber is None:
pipelines.append(["llm"])
tools_config_args['transcriber'] = transcriber
pipeline = ["transcriber", "llm"]
if synthesizer is not None:
pipeline.append("synthesizer")
tools_config_args["synthesizer"] = synthesizer
pipelines.append(pipeline)
if enable_textual_input:
pipelines.append(["llm"])File: bolna/models.py (Line 302)
class ToolsChainModel(BaseModel):
execution: str = "parallel"
pipelines: List[List[str]]Proposed Solution:
class PipelineComponent(str, Enum):
TRANSCRIBER = "transcriber"
LLM = "llm"
SYNTHESIZER = "synthesizer"
INPUT = "input"
OUTPUT = "output"5. Agent Types
Current Issues:
File: bolna/models.py (Line 340)
class AgentModel(BaseModel):
agent_name: str
agent_type: str = "other"File: API.md (Lines 45-52)
"llm_agent": {
"agent_type": "simple_llm_agent",
"agent_flow_type": "streaming",
"routes": null,
"llm_config": {
"agent_flow_type": "streaming",
"provider": "openai",
"request_json": true,
"model": "gpt-4o-mini"
}
}Proposed Solution:
class AgentType(str, Enum):
SIMPLE_LLM = "simple_llm_agent"
CONVERSATIONAL = "conversational_agent"
EXTRACTION = "extraction_agent"
GRAPH_BASED = "graph_based_agent"
OTHER = "other"6. OpenAI Models (Fix for Compatibility Issue)
Current Issues:
File: bolna/llms/openai_llm.py (Line 199)
def get_response_format(self, is_json_format: bool):
if is_json_format and self.model in ('gpt-4-1106-preview', 'gpt-3.5-turbo-1106', 'gpt-4o-mini'):
return {"type": "json_object"}
else:
return {"type": "text"}File: bolna/llms/openai_llm.py (Line 17)
def __init__(self, max_tokens=100, buffer_size=40, model="gpt-3.5-turbo-16k", temperature=0.1, language=DEFAULT_LANGUAGE_CODE, **kwargs):File: bolna/models.py (Line 135)
class MongoDBProviderConfig(BaseModel):
connection_string: Optional[str] = None
db_name: Optional[str] = None
collection_name: Optional[str] = None
index_name: Optional[str] = None
llm_model: Optional[str] = "gpt-3.5-turbo"
embedding_model: Optional[str] = "text-embedding-3-small"
embedding_dimensions: Optional[int] = 256Proposed Solution:
class OpenAIModel(str, Enum):
GPT_35_TURBO = "gpt-3.5-turbo"
GPT_35_TURBO_1106 = "gpt-3.5-turbo-1106"
GPT_4_1106_PREVIEW = "gpt-4-1106-preview"
GPT_4O_MINI = "gpt-4o-mini"
GPT_4O = "gpt-4o"
GPT_4_TURBO = "gpt-4-turbo"
GPT_41_NANO = "gpt-4.1-nano" # New model support
class OpenAICapability(str, Enum):
JSON_MODE = "json_mode"
FUNCTION_CALLING = "function_calling"
STREAMING = "streaming"
# Model capabilities mapping
OPENAI_MODEL_CAPABILITIES = {
OpenAIModel.GPT_35_TURBO_1106: [OpenAICapability.JSON_MODE, OpenAICapability.STREAMING],
OpenAIModel.GPT_4_1106_PREVIEW: [OpenAICapability.JSON_MODE, OpenAICapability.FUNCTION_CALLING],
OpenAIModel.GPT_4O_MINI: [OpenAICapability.JSON_MODE, OpenAICapability.STREAMING],
OpenAIModel.GPT_4O: [OpenAICapability.JSON_MODE, OpenAICapability.FUNCTION_CALLING],
OpenAIModel.GPT_41_NANO: [OpenAICapability.JSON_MODE, OpenAICapability.STREAMING],
}🟢 LOWER PRIORITY - Nice to Have
7. Status/State Values
Current Issues:
File: local_setup/quickstart_server.py (Line 59)
data_for_db["assistant_status"] = "seeding"File: local_setup/quickstart_server.py (Line 82)
return {"agent_id": agent_uuid, "state": "created"}File: API.md (Line 101)
{
"agent_id": "uuid-string",
"state": "created"
}Proposed Solution:
class AgentStatus(str, Enum):
CREATED = "created"
SEEDING = "seeding"
ACTIVE = "active"
COMPLETED = "completed"
FAILED = "failed"
DELETED = "deleted"8. Message/Data Types
Current Issues:
File: bolna/output_handlers/default.py (Lines 59-82)
if packet["meta_info"]['type'] in ('audio', 'text'):
if packet["meta_info"]['type'] == 'audio':
logger.info(f"Sending audio")
data = base64.b64encode(packet['data']).decode("utf-8")
elif packet["meta_info"]['type'] == 'text':
logger.info(f"Sending text response {packet['data']}")
data = packet['data']
# sending of pre-mark message
if packet["meta_info"]['type'] == 'audio':
pre_mark_event_meta_data = {
"type": "pre_mark_message",
}
mark_id = str(uuid.uuid4())
self.mark_event_meta_data.update_data(mark_id, pre_mark_event_meta_data)
mark_message = {
"type": "mark",
"name": mark_id
}
response = {"data": data, "type": packet["meta_info"]['type']}Proposed Solution:
class MessageType(str, Enum):
AUDIO = "audio"
TEXT = "text"
MARK = "mark"
PRE_MARK = "pre_mark_message"9. Language Codes
Current Issues:
File: bolna/constants.py (Lines 35-66)
PRE_FUNCTION_CALL_MESSAGE = {
"en": "Just give me a moment, I'll be back with you.",
"ge": "Geben Sie mir einen Moment Zeit, ich bin gleich wieder bei Ihnen."
}
TRANSFERING_CALL_FILLER = {
"en": "Sure, I'll transfer the call for you. Please wait a moment...",
"fr": "D'accord, je transfère l'appel. Un instant, s'il vous plaît."
}
DEFAULT_LANGUAGE_CODE = 'en'File: bolna/models.py (Line 79)
class Transcriber(BaseModel):
provider: str
encoding: Optional[str] = "linear16"
language: Optional[str] = "en"
model: Optional[str] = None
stream: bool = TrueFile: API.md (Line 72)
"transcriber": {
"encoding": "linear16",
"language": "en",
"provider": "deepgram",
"stream": true
}Proposed Solution:
class LanguageCode(str, Enum):
ENGLISH = "en"
GERMAN = "ge"
FRENCH = "fr"
SPANISH = "es"Implementation Strategy
Phase 1: Core Infrastructure (High Priority)
- Create
bolna/enums/package with separate files for each enum category - Replace provider mappings in
providers.py - Update model validations in
models.py - Update audio format handling across synthesizers/transcribers
Phase 2: Feature Enhancement (Medium Priority)
- Replace task type strings throughout the codebase
- Update pipeline component references
- Fix OpenAI model compatibility issues with enum-based approach
Phase 3: Polish (Lower Priority)
- Replace status/state strings
- Centralize message types
- Standardize language codes
Proposed File Structure
bolna/
├── enums/
│ ├── __init__.py
│ ├── providers.py # SynthesizerProvider, TranscriberProvider, LLMProvider
│ ├── formats.py # AudioFormat, AudioEncoding, ResponseFormat
│ ├── tasks.py # TaskType, PipelineComponent, AgentType
│ ├── models.py # OpenAIModel, OpenAICapability + capabilities mapping
│ ├── states.py # AgentStatus, MessageType
│ └── localization.py # LanguageCode
└── ...
Breaking Changes
- Minimal: Most changes will be backward compatible by using
strenum inheritance - Migration: Existing string values will continue to work during transition
- Documentation: Update API documentation to reference enum values
Benefits After Implementation
- Better IDE Support: Autocompletion for all provider/format/type values
- Compile-time Safety: Catch invalid values before runtime
- Easier Extension: Adding new providers/formats becomes trivial
- Centralized Documentation: All valid values documented in one place
- Future-proof: Easy to add new OpenAI models or other providers
Estimated Impact
- Files to modify: ~25-30 files
- Lines of code: ~200-300 changes
- Risk level: Low (backward compatible with proper migration)
- Developer experience: Significantly improved
Open to discuss this out and set proper priorities here to start of the tasks needed here
Note: I have used claude to format the message to convey the message better