Skip to content

Incorrect Monetary Cost Calculation #215

@SolidLao

Description

@SolidLao

lotus_bug.zip

Bug Description

LOTUS cost calculation differs significantly from standard token-based pricing calculations for certain language models and data modality (image). I tested semantic filter for both text and image, with different LLMs.

  • For gpt-5, they are always the same.
  • For gemini-2.5-flash, it is the same for text modality, but different for image modality.
  • For gpt-5-mini and gemini-2.0-flash, they are always different.

Expected Behavior

LOTUS cost calculations should match standard token-based pricing calculations using the published API rates for each model. The cost should be calculated as: (prompt_tokens / 1_000_000) * input_price + (completion_tokens / 1_000_000) * output_price

Steps to Reproduce

I have attached all the data and codes to run the analysis in the zip file. Or use a simpler way to reproduce it:

  1. Set up LOTUS with any of the affected models (gpt-5-mini, gemini-2.0-flash, gemini-2.5-flash)
  2. Load a dataset with both text descriptions and image paths
  3. Apply semantic filtering using either text or image filters
  4. Compare lotus.settings.lm.stats.physical_usage.total_cost with manual calculation based on token usage
  5. Observe significant cost discrepancies

Environment Information

Operating System:

  • Linux

Python Version:
Python 3.12

Package Versions:

  • lotus 1.1.3

Error Messages and Logs

No error messages - this is a calculation discrepancy issue.

Screenshots

See the summary table below

Minimal Reproduction Example

import lotus 
from lotus.dtype_extensions import ImageArray
import pandas as pd
from lotus.models import LM

# Set up model
lotus.settings.configure(lm=LM("gpt-5-mini"))

# Create test data
df = pd.DataFrame({
    "ImagePath": ["./test_image.png"],
    "TextDescription": ["This is a dog"]
})

# Test with image filter
df.loc[:, "Image"] = ImageArray(df["ImagePath"])
filtered_df = df.sem_filter("The image {Image} contains a dog")

# Get costs
lotus_cost = lotus.settings.lm.stats.physical_usage.total_cost
prompt_tokens = lotus.settings.lm.stats.physical_usage.prompt_tokens
completion_tokens = lotus.settings.lm.stats.physical_usage.completion_tokens

# Manual calculation (gpt-5-mini rates: $0.25/$2.0 per 1M tokens)
calculated_cost = (prompt_tokens / 1_000_000) * 0.25 + (completion_tokens / 1_000_000) * 2.0

print(f"LOTUS cost: ${lotus_cost:.6f}")
print(f"Calculated cost: ${calculated_cost:.6f}")
print(f"Difference: ${abs(lotus_cost - calculated_cost):.6f}")
# Shows ~80% difference for gpt-5-mini with image filter

Additional Context

Detailed Analysis Results:

Model Filter Type LOTUS Cost Calculated Cost Difference % Difference
gpt-5-mini text $0.000174 $0.000869 $0.000695 80.00%
gpt-5-mini image $0.000141 $0.000708 $0.000566 80.00%
gpt-5 text $0.001509 $0.001509 $0.000000 0.00%
gpt-5 image $0.001008 $0.001008 $0.000000 0.00%
gemini-2.0-flash text $0.000010 $0.000015 $0.000005 33.33%
gemini-2.0-flash image $0.000009 $0.000207 $0.000198 95.74%
gemini-2.5-flash text $0.000195 $0.000195 $0.000000 0.00%
gemini-2.5-flash image $0.000243 $0.000630 $0.000387 61.45%

Observations:

  • For gpt-5, they are always the same.
  • For gemini-2.5-flash, it is the same for text modality, but different for image modality.
  • For gpt-5-mini and gemini-2.0-flash, they are always different.

LOTUS uses completion_cost from LiteLLM, which is reported by other users that there are some issues to solve.
It would be safer if LOTUS could directly calculate money cost basd on token consumption?

Checklist

  • I have searched existing issues to avoid duplicates
  • I have provided all required information
  • I have tested with the latest version of the package
  • I have included a minimal reproduction example (if applicable)

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions