Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-English columns cannot be correctly identified. #1374

Closed
wugxxx opened this issue Mar 7, 2025 · 7 comments · Fixed by Canner/wren-engine#1091
Closed

Non-English columns cannot be correctly identified. #1374

wugxxx opened this issue Mar 7, 2025 · 7 comments · Fixed by Canner/wren-engine#1091
Labels
bug Something isn't working

Comments

@wugxxx
Copy link

wugxxx commented Mar 7, 2025

Describe the bug
My data table contains some Chinese data, but it cannot be correctly identified.
I checked the docker log and found no related exception output.

To Reproduce
When clicking the Preview Data button to preview data or Q&A

Expected behavior
Chinese data can be correctly identified.

Screenshots

Image

Image

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: chrome

Wren AI Information

  • WREN_PRODUCT_VERSION=0.15.4
  • WREN_ENGINE_VERSION=0.14.3
  • WREN_AI_SERVICE_VERSION=0.15.18
  • IBIS_SERVER_VERSION=0.14.3
  • WREN_UI_VERSION=0.20.2
  • WREN_BOOTSTRAP_VERSION=0.1.5

Additional context
Nothing

Relevant log output

type: llm
provider: litellm_llm
models:
- model: openai/Qwen2.5-32B-Instruct
  api_base: http://10.1.200.12:4000/v1
  api_key_name: LLM_OPENAI_API_KEY
  timeout: 300
  kwargs:
    temperature: 0
    n: 1
    # for better consistency of llm response
    seed: 0
    max_tokens: 10240

---
type: embedder
provider: litellm_embedder
models:
- model: openai/m3e-base
  api_base: http://10.1.200.12:6006/v1
  api_key_name: EMBEDDER_OPENAI_API_KEY
  timeout: 600

---
type: engine
provider: wren_ui
endpoint: http://wren-ui:3000

---
type: document_store
provider: qdrant
location: http://qdrant:6333
embedding_model_dim: 768
timeout: 120
recreate_index: true

---
type: pipeline
pipes:
  - name: db_schema_indexing
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: historical_question_indexing
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: table_description_indexing
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: db_schema_retrieval
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: historical_question_retrieval
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: sql_generation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: sql_correction
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: followup_sql_generation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: sql_summary
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: sql_answer
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: sql_breakdown
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: sql_expansion
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: sql_explanation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: semantics_description
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: relationship_recommendation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: question_recommendation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: question_recommendation_db_schema_retrieval
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: question_recommendation_sql_generation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui
  - name: chart_generation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: chart_adjustment
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: intent_classification
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    embedder: litellm_embedder.openai/m3e-base
    document_store: qdrant
  - name: data_assistance
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: sql_pairs_indexing
    document_store: qdrant
    embedder: litellm_embedder.openai/m3e-base
  - name: sql_pairs_deletion
    document_store: qdrant
    embedder: litellm_embedder.openai/m3e-base
  - name: sql_pairs_retrieval
    document_store: qdrant
    embedder: litellm_embedder.openai/m3e-base
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: preprocess_sql_data
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: sql_executor
    engine: wren_ui
  - name: sql_question_generation
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: sql_generation_reasoning
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
  - name: sql_regeneration
    llm: litellm_llm.openai/Qwen2.5-32B-Instruct
    engine: wren_ui

---
settings:
  column_indexing_batch_size: 50
  table_retrieval_size: 10
  table_column_retrieval_size: 100
  allow_using_db_schemas_without_pruning: false
  query_cache_maxsize: 1000
  query_cache_ttl: 3600
  langfuse_host: https://cloud.langfuse.com
  langfuse_enable: true
  logging_level: DEBUG
  development: false

wrenai-wren-ui.log
wrenai-wren-engine.log
wrenai-wren-ai-service.log
wrenai-ibis-server.log

@wugxxx wugxxx added the bug Something isn't working label Mar 7, 2025
@wugxxx
Copy link
Author

wugxxx commented Mar 7, 2025

The MySQL database-related configurations are as follows:
version: 8.0.33
charset: utf8mb4
collation: utf8mb4_general_ci

@wugxxx
Copy link
Author

wugxxx commented Mar 10, 2025

ok, I found this is a problem due to the charset. I just add this code: kwargs.setdefault("charset", "utf8mb4") to the function get_mysql_connection and restart the wrenai-ibis-server-1 container, then it works!

So I think whether more optional parameters can be provided when connecting to the database on the panel?

@wwwy3y3
Copy link
Member

wwwy3y3 commented Mar 11, 2025

ok, I found this is a problem due to the charset. I just add this code: kwargs.setdefault("charset", "utf8mb4") to the function get_mysql_connection and restart the wrenai-ibis-server-1 container, then it works!

So I think whether more optional parameters can be provided when connecting to the database on the panel?

@douenergy what do you think ? Do you think we should provide charset (or maybe collation?)

@wwwy3y3
Copy link
Member

wwwy3y3 commented Mar 11, 2025

Per discussion with @douenergy and @goldmedal , I'll add this ticket to our roadmap. thanks for the report!

@wugxxx
Copy link
Author

wugxxx commented Mar 11, 2025

Per discussion with @douenergy and @goldmedal , I'll add this ticket to our roadmap. thanks for the report!

@wwwy3y3 Thanks for your response! It would be great if more configurable options, such as charset or collation, could be made available to accommodate a wider range of scenarios. I'm looking forward to your updates.

@wwwy3y3
Copy link
Member

wwwy3y3 commented Mar 20, 2025

Hi @wugxxx It's fixed in 0.17.0. You could give it a try. We set utf8mb4 as the default charset.

@wugxxx
Copy link
Author

wugxxx commented Mar 24, 2025

@wwwy3y3 That's awesome! I will give it a try. Thank you so much for getting this fixed.

@wugxxx wugxxx closed this as completed Mar 24, 2025
@github-project-automation github-project-automation bot moved this from Next to Done in Wren AI Public Roadmap Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants