Incorrect Monetary Cost Calculation

[lotus_bug.zip](https://github.com/user-attachments/files/22395097/lotus_bug.zip)

## Bug Description

LOTUS cost calculation differs significantly from standard token-based pricing calculations for certain language models and data modality (image). I tested semantic filter for both text and image, with different LLMs. 
- For gpt-5, they are always the same.
- For gemini-2.5-flash, it is the same for text modality, but different for image modality.
- For gpt-5-mini and gemini-2.0-flash, they are always different.

## Expected Behavior

LOTUS cost calculations should match standard token-based pricing calculations using the published API rates for each model. The cost should be calculated as: `(prompt_tokens / 1_000_000) * input_price + (completion_tokens / 1_000_000) * output_price`

## Steps to Reproduce

I have attached all the data and codes to run the analysis in the zip file. Or use a simpler way to reproduce it:

1. Set up LOTUS with any of the affected models (`gpt-5-mini`, `gemini-2.0-flash`, `gemini-2.5-flash`)
2. Load a dataset with both text descriptions and image paths
3. Apply semantic filtering using either text or image filters
4. Compare `lotus.settings.lm.stats.physical_usage.total_cost` with manual calculation based on token usage
5. Observe significant cost discrepancies

## Environment Information

**Operating System:**
- [x] Linux

**Python Version:**
Python 3.12

**Package Versions:**
- lotus 1.1.3

## Error Messages and Logs

No error messages - this is a calculation discrepancy issue.

## Screenshots

See the summary table below

## Minimal Reproduction Example

```python
import lotus 
from lotus.dtype_extensions import ImageArray
import pandas as pd
from lotus.models import LM

# Set up model
lotus.settings.configure(lm=LM("gpt-5-mini"))

# Create test data
df = pd.DataFrame({
    "ImagePath": ["./test_image.png"],
    "TextDescription": ["This is a dog"]
})

# Test with image filter
df.loc[:, "Image"] = ImageArray(df["ImagePath"])
filtered_df = df.sem_filter("The image {Image} contains a dog")

# Get costs
lotus_cost = lotus.settings.lm.stats.physical_usage.total_cost
prompt_tokens = lotus.settings.lm.stats.physical_usage.prompt_tokens
completion_tokens = lotus.settings.lm.stats.physical_usage.completion_tokens

# Manual calculation (gpt-5-mini rates: $0.25/$2.0 per 1M tokens)
calculated_cost = (prompt_tokens / 1_000_000) * 0.25 + (completion_tokens / 1_000_000) * 2.0

print(f"LOTUS cost: ${lotus_cost:.6f}")
print(f"Calculated cost: ${calculated_cost:.6f}")
print(f"Difference: ${abs(lotus_cost - calculated_cost):.6f}")
# Shows ~80% difference for gpt-5-mini with image filter
```

## Additional Context

**Detailed Analysis Results:**

| Model            | Filter Type | LOTUS Cost | Calculated Cost | Difference | % Difference |
|------------------|-------------|------------|-----------------|------------|--------------|
| gpt-5-mini       | text        | $0.000174  | $0.000869       | $0.000695  | 80.00%       |
| gpt-5-mini       | image       | $0.000141  | $0.000708       | $0.000566  | 80.00%       |
| gpt-5            | text        | $0.001509  | $0.001509       | $0.000000  | 0.00%        |
| gpt-5            | image       | $0.001008  | $0.001008       | $0.000000  | 0.00%        |
| gemini-2.0-flash | text        | $0.000010  | $0.000015       | $0.000005  | 33.33%       |
| gemini-2.0-flash | image       | $0.000009  | $0.000207       | $0.000198  | 95.74%       |
| gemini-2.5-flash | text        | $0.000195  | $0.000195       | $0.000000  | 0.00%        |
| gemini-2.5-flash | image       | $0.000243  | $0.000630       | $0.000387  | 61.45%       |

**Observations:**
- For gpt-5, they are always the same.
- For gemini-2.5-flash, it is the same for text modality, but different for image modality.
- For gpt-5-mini and gemini-2.0-flash, they are always different.

LOTUS uses `completion_cost` from LiteLLM, which is reported by other users that there are some issues to solve.
It would be safer if LOTUS could directly calculate money cost basd on token consumption?

## Checklist

- [x] I have searched existing issues to avoid duplicates
- [x] I have provided all required information
- [x] I have tested with the latest version of the package
- [x] I have included a minimal reproduction example (if applicable)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect Monetary Cost Calculation #215

Bug Description

Expected Behavior

Steps to Reproduce

Environment Information

Error Messages and Logs

Screenshots

Minimal Reproduction Example

Additional Context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Filter Type	LOTUS Cost	Calculated Cost	Difference	% Difference
gpt-5-mini	text	$0.000174	$0.000869	$0.000695	80.00%
gpt-5-mini	image	$0.000141	$0.000708	$0.000566	80.00%
gpt-5	text	$0.001509	$0.001509	$0.000000	0.00%
gpt-5	image	$0.001008	$0.001008	$0.000000	0.00%
gemini-2.0-flash	text	$0.000010	$0.000015	$0.000005	33.33%
gemini-2.0-flash	image	$0.000009	$0.000207	$0.000198	95.74%
gemini-2.5-flash	text	$0.000195	$0.000195	$0.000000	0.00%
gemini-2.5-flash	image	$0.000243	$0.000630	$0.000387	61.45%

Incorrect Monetary Cost Calculation #215

Description

Bug Description

Expected Behavior

Steps to Reproduce

Environment Information

Error Messages and Logs

Screenshots

Minimal Reproduction Example

Additional Context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions