-
Notifications
You must be signed in to change notification settings - Fork 136
Description
Bug Description
LOTUS cost calculation differs significantly from standard token-based pricing calculations for certain language models and data modality (image). I tested semantic filter for both text and image, with different LLMs.
- For gpt-5, they are always the same.
- For gemini-2.5-flash, it is the same for text modality, but different for image modality.
- For gpt-5-mini and gemini-2.0-flash, they are always different.
Expected Behavior
LOTUS cost calculations should match standard token-based pricing calculations using the published API rates for each model. The cost should be calculated as: (prompt_tokens / 1_000_000) * input_price + (completion_tokens / 1_000_000) * output_price
Steps to Reproduce
I have attached all the data and codes to run the analysis in the zip file. Or use a simpler way to reproduce it:
- Set up LOTUS with any of the affected models (
gpt-5-mini,gemini-2.0-flash,gemini-2.5-flash) - Load a dataset with both text descriptions and image paths
- Apply semantic filtering using either text or image filters
- Compare
lotus.settings.lm.stats.physical_usage.total_costwith manual calculation based on token usage - Observe significant cost discrepancies
Environment Information
Operating System:
- Linux
Python Version:
Python 3.12
Package Versions:
- lotus 1.1.3
Error Messages and Logs
No error messages - this is a calculation discrepancy issue.
Screenshots
See the summary table below
Minimal Reproduction Example
import lotus
from lotus.dtype_extensions import ImageArray
import pandas as pd
from lotus.models import LM
# Set up model
lotus.settings.configure(lm=LM("gpt-5-mini"))
# Create test data
df = pd.DataFrame({
"ImagePath": ["./test_image.png"],
"TextDescription": ["This is a dog"]
})
# Test with image filter
df.loc[:, "Image"] = ImageArray(df["ImagePath"])
filtered_df = df.sem_filter("The image {Image} contains a dog")
# Get costs
lotus_cost = lotus.settings.lm.stats.physical_usage.total_cost
prompt_tokens = lotus.settings.lm.stats.physical_usage.prompt_tokens
completion_tokens = lotus.settings.lm.stats.physical_usage.completion_tokens
# Manual calculation (gpt-5-mini rates: $0.25/$2.0 per 1M tokens)
calculated_cost = (prompt_tokens / 1_000_000) * 0.25 + (completion_tokens / 1_000_000) * 2.0
print(f"LOTUS cost: ${lotus_cost:.6f}")
print(f"Calculated cost: ${calculated_cost:.6f}")
print(f"Difference: ${abs(lotus_cost - calculated_cost):.6f}")
# Shows ~80% difference for gpt-5-mini with image filterAdditional Context
Detailed Analysis Results:
| Model | Filter Type | LOTUS Cost | Calculated Cost | Difference | % Difference |
|---|---|---|---|---|---|
| gpt-5-mini | text | $0.000174 | $0.000869 | $0.000695 | 80.00% |
| gpt-5-mini | image | $0.000141 | $0.000708 | $0.000566 | 80.00% |
| gpt-5 | text | $0.001509 | $0.001509 | $0.000000 | 0.00% |
| gpt-5 | image | $0.001008 | $0.001008 | $0.000000 | 0.00% |
| gemini-2.0-flash | text | $0.000010 | $0.000015 | $0.000005 | 33.33% |
| gemini-2.0-flash | image | $0.000009 | $0.000207 | $0.000198 | 95.74% |
| gemini-2.5-flash | text | $0.000195 | $0.000195 | $0.000000 | 0.00% |
| gemini-2.5-flash | image | $0.000243 | $0.000630 | $0.000387 | 61.45% |
Observations:
- For gpt-5, they are always the same.
- For gemini-2.5-flash, it is the same for text modality, but different for image modality.
- For gpt-5-mini and gemini-2.0-flash, they are always different.
LOTUS uses completion_cost from LiteLLM, which is reported by other users that there are some issues to solve.
It would be safer if LOTUS could directly calculate money cost basd on token consumption?
Checklist
- I have searched existing issues to avoid duplicates
- I have provided all required information
- I have tested with the latest version of the package
- I have included a minimal reproduction example (if applicable)