Name	Name	Last commit message	Last commit date
parent directory ..
op_manager	op_manager
rag_utils	rag_utils
test	test
url_verifier	url_verifier
DEMO.md	DEMO.md
DEMO_ZH.md	DEMO_ZH.md
README.md	README.md
README_ZH.md	README_ZH.md
__init__.py	__init__.py
agent_helper.py	agent_helper.py
app_deploy.py	app_deploy.py
prompts.py	prompts.py
session_logger.py	session_logger.py
setup_server.sh	setup_server.sh

Data-Juicer Q&A Copilot

Q&A Copilot is the intelligent question-answering component of the Data-Juicer Agents system, a professional Data-Juicer AI assistant built on the AgentScope framework.

You can chat with our Q&A Copilot Juicer on the official documentation site of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem.

Core Components

Agent: Intelligent Q&A agent based on ReActAgent
FAQ RAG System: Fast and accurate FAQ retrieval powered by Qdrant vector database and DashScope text embedding model
MCP Integration: Online GitHub search capabilities through GitHub MCP Server
Redis Storage: Supports session history and feedback data persistence
Web API: Provides RESTful interfaces for frontend integration

Quick Start

Prerequisites

3.10 <= Python <= 3.12
Docker (for running Qdrant vector database)
Redis server (optional, activated by SESSION_STORE_TYPE=redis)
DashScope API Key (for large language model calls and text embedding)

Installation

Install dependencies

cd ..
uv pip install .[qa]
cd qa-copilot

Install Docker (for Qdrant vector database)
```
# Ubuntu/Debian
sudo apt-get install docker.io
sudo systemctl start docker

# macOS
brew install docker
```
Note: The system will automatically check and start the Qdrant Docker container on startup. If FAQ data is not initialized, the system will automatically read from qa-copilot/rag_utils/faq.txt and initialize the RAG data.
Install and start Redis (optional - skip if using the default SESSION_STORE_TYPE=json)
```
# Ubuntu/Debian
sudo apt-get install redis-server
redis-server --daemonize yes

# macOS
brew install redis
brew services start redis
```
Note:
- If you set SESSION_STORE_TYPE=json (default), session history will be stored as JSON files in the SESSION_STORE_DIR directory with automatic TTL-based cleanup.
- If you set SESSION_STORE_TYPE=redis, you need to have Redis server running. Session state is automatically managed by RedisMemory, and TTL is handled by Redis server configuration.

Configuration

Set required environment variables

export DASHSCOPE_API_KEY="your_dashscope_api_key"
export GITHUB_TOKEN="your_github_token"  # Required: for GitHub MCP integration

Set optional environment variables

Session Storage Configuration:

# Session store type: "json" (default) or "redis"
export SESSION_STORE_TYPE="json"  # or "redis"

# For JSON mode (default):
export SESSION_STORE_DIR="./sessions"  # Session file storage directory (default: "./sessions")
export SESSION_TTL_SECONDS="21600"  # Session TTL in seconds (default: 21600 = 6 hours)
export SESSION_CLEANUP_INTERVAL="1800"  # Cleanup interval in seconds (default: 1800 = 30 minutes)

# For Redis mode:
export REDIS_HOST="localhost"  # Redis server host (default: "localhost")
export REDIS_PORT="6379"  # Redis server port (default: 6379)
export REDIS_DB="0"  # Redis database number (default: 0)
export REDIS_PASSWORD=""  # Redis password (default: None, optional)
export REDIS_MAX_CONNECTIONS="10"  # Redis max connections (default: 10)
# Note: Redis TTL is handled by Redis server configuration, not by application

Model Configuration:

export MAX_TOKENS="200000"  # Maximum tokens for context window (default: 200000)
# Note: This value is multiplied by 3 when passed to DashScopeChatFormatter
# because CharTokenCounter counts characters, and ~3 chars ≈ 1 token for mixed CHN & ENG text

Qdrant Vector Database:

export QDRANT_HOST="127.0.0.1"  # Qdrant server host (default: "127.0.0.1")
export QDRANT_PORT="6333"  # Qdrant server port (default: 6333)

Service Configuration:

export DJ_COPILOT_SERVICE_HOST="127.0.0.1"  # Service host address (default: "127.0.0.1")
export DJ_COPILOT_ENABLE_LOGGING="true"  # Enable session logging (default: "true")
export DJ_COPILOT_LOG_DIR="./logs"  # Log directory (default: "./logs")

Advanced Configuration:

export FASTAPI_CONFIG_PATH=""  # Path to FastAPI config JSON file (optional)
export SAFE_CHECK_HANDLER_PATH=""  # Path to custom safe check handler module (optional)

Configure FAQ file (optional)

The system uses qa-copilot/rag_utils/faq.txt as the FAQ data source by default. You can edit this file to customize FAQ content. FAQ file format example:
```
'id': 'FAQ_001', 'question': 'What is Data-Juicer?', 'answer': 'Data-Juicer is a...'
'id': 'FAQ_002', 'question': 'How to install?', 'answer': 'You can install by...'
```
Start the service
```
bash setup_server.sh
```
On first startup, the system will automatically:
- Check and start the Qdrant Docker container (port 6333)
- Initialize FAQ RAG data (if not already initialized)
- Start the Web API service

Usage

Web API Interfaces

After starting the service, the system provides the following API interfaces:

1. Q&A Conversation

POST /process
Content-Type: application/json

{
  "input": [
    {
      "role": "user", 
      "content": [{"type": "text", "text": "How to use Data-Juicer for data cleaning?"}]
    }
  ],
  "session_id": "your_session_id",
  "user_id": "user_id"
}

2. Get Session History

POST /memory
Content-Type: application/json

{
  "session_id": "your_session_id",
  "user_id": "user_id"
}

3. Clear Session History

POST /clear
Content-Type: application/json

{
  "session_id": "your_session_id",
  "user_id": "user_id"
}

4. Submit User Feedback

POST /feedback
Content-Type: application/json

{
  "data": {
    "message_id": "message_id_here",
    "feedback_type": "like",
    "comment": "optional user comment"
  },
  "session_id": "your_session_id",
  "user_id": "user_id"
}

Parameters:

message_id: The ID of the message to provide feedback on (required)
feedback_type: Type of feedback, either "like" or "dislike" (required)
comment: Optional user comment text (optional)

Response example:

{
  "status": "ok",
  "message": "Feedback recorded successfully"
}

WebUI

you can simply run the following command in your terminal:

npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/process

Refer to AgentScope Runtime WebUI for more information.

Configuration Details

Environment Variables Summary

Variable	Required	Default	Description
`DASHSCOPE_API_KEY`	✅ Yes	-	DashScope API key for LLM and embedding
`GITHUB_TOKEN`	✅ Yes	-	GitHub token for MCP integration
`SESSION_STORE_TYPE`	❌ No	`"json"`	Session storage type: `"json"` or `"redis"`
`SESSION_STORE_DIR`	❌ No	`"./sessions"`	Session file directory (JSON mode only)
`SESSION_TTL_SECONDS`	❌ No	`21600`	Session TTL in seconds (JSON mode only, 6 hours)
`SESSION_CLEANUP_INTERVAL`	❌ No	`1800`	Cleanup interval in seconds (JSON mode only, 30 minutes)
`REDIS_HOST`	❌ No	`"localhost"`	Redis server host (Redis mode only)
`REDIS_PORT`	❌ No	`6379`	Redis server port (Redis mode only)
`REDIS_DB`	❌ No	`0`	Redis database number (Redis mode only)
`REDIS_PASSWORD`	❌ No	`None`	Redis password (Redis mode only, optional)
`REDIS_MAX_CONNECTIONS`	❌ No	`10`	Redis max connections (Redis mode only)
`QDRANT_HOST`	❌ No	`"127.0.0.1"`	Qdrant server host
`QDRANT_PORT`	❌ No	`6333`	Qdrant server port
`MAX_TOKENS`	❌ No	`200000`	Maximum tokens for context window (multiplied by 3 for CharTokenCounter)
`DJ_COPILOT_SERVICE_HOST`	❌ No	`"127.0.0.1"`	Service host address
`DJ_COPILOT_ENABLE_LOGGING`	❌ No	`"true"`	Enable session logging
`DJ_COPILOT_LOG_DIR`	❌ No	`"./logs"`	Log directory
`FASTAPI_CONFIG_PATH`	❌ No	`""`	Path to FastAPI config JSON file
`SAFE_CHECK_HANDLER_PATH`	❌ No	`""`	Path to custom safe check handler

Model Configuration

In app_deploy.py, you can configure the language model to use:

model=DashScopeChatModel(
    "qwen3-max-2026-01-23",  # Model name
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    stream=True,  # Enable streaming response
    enable_thinking=True,  # Enable thinking mode
)

The formatter uses MAX_TOKENS environment variable (default: 200000) to limit the context window size. Since CharTokenCounter counts characters and approximately 3 characters ≈ 1 token for mixed Chinese and English text, the value is multiplied by 3 when passed to DashScopeChatFormatter.

Session Storage Configuration

JSON Mode (Default):

Session history is stored as JSON files in SESSION_STORE_DIR directory
Automatic TTL-based cleanup runs every SESSION_CLEANUP_INTERVAL seconds
Sessions expire after SESSION_TTL_SECONDS seconds of inactivity
No external dependencies required

Redis Mode:

Session history is stored in Redis
Session state is automatically managed by RedisMemory
TTL is handled by Redis server configuration (not application-level)
Requires Redis server to be running

FAQ RAG Configuration

The FAQ RAG system uses the following configuration:

Vector Database: Qdrant (running in Docker container)
Embedding Model: DashScope text-embedding-v4
Vector Dimension: 1024
Data Source: qa-copilot/rag_utils/faq.txt
Storage Location: qa-copilot/rag_utils/qdrant_storage
Qdrant Host: Configurable via QDRANT_HOST (default: 127.0.0.1)
Qdrant Port: Configurable via QDRANT_PORT (default: 6333)

The system automatically checks if RAG data is initialized on startup. If not initialized, it will automatically read the FAQ file and create vector indexes.

Troubleshooting

Common Issues

Docker/Qdrant Issues
- Ensure Docker service is running: docker --version
- Check Qdrant container status: docker ps | grep qdrant
- Manually start Qdrant container: docker start qdrant
- Check if Qdrant port is occupied: netstat -tlnp | grep 6333
- To reinitialize RAG data, delete the qa-copilot/rag_utils/qdrant_storage directory and restart the service
Redis connection failure (when using SESSION_STORE_TYPE=redis)
- Ensure Redis service is running: redis-cli ping
- Check if Redis port is occupied: netstat -tlnp | grep 6379 (or your configured REDIS_PORT)
- Verify Redis configuration: Check REDIS_HOST, REDIS_PORT, REDIS_DB, and REDIS_PASSWORD environment variables
- Note: Redis TTL is managed by Redis server, not by the application
MCP service startup failure
- Ensure GITHUB_TOKEN is set and correct (required environment variable)
- Verify GitHub token has necessary permissions for MCP integration
API Key error
- Verify DASHSCOPE_API_KEY environment variable is correctly set
- Confirm API Key is valid and has sufficient quota
FAQ retrieval returns no results
- Confirm FAQ file qa-copilot/rag_utils/faq.txt exists and is properly formatted
- Check if Qdrant container is running normally
- Review logs to confirm RAG data was successfully initialized

Acknowledgments

Parts of this project's code are adapted from the following open-source projects:

FAQ RAG System & GitHub MCP Integration: Adapted from the implementation in AgentScope Samples - Alias

Special thanks to the AgentScope team for their excellent framework and sample code!

License

This project uses the same license as the main project. For details, please refer to the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Data-Juicer Q&A Copilot

Core Components

Quick Start

Prerequisites

Installation

Configuration

Usage

Web API Interfaces

1. Q&A Conversation

2. Get Session History

3. Clear Session History

4. Submit User Feedback

WebUI

Configuration Details

Environment Variables Summary

Model Configuration

Session Storage Configuration

FAQ RAG Configuration

Troubleshooting

Common Issues

Acknowledgments

License

Related Links

FilesExpand file tree

qa-copilot

Directory actions

More options

Directory actions

More options

Latest commit

History

qa-copilot

Folders and files

parent directory

README.md

Data-Juicer Q&A Copilot

Core Components

Quick Start

Prerequisites

Installation

Configuration

Usage

Web API Interfaces

1. Q&A Conversation

2. Get Session History

3. Clear Session History

4. Submit User Feedback

WebUI

Configuration Details

Environment Variables Summary

Model Configuration

Session Storage Configuration

FAQ RAG Configuration

Troubleshooting

Common Issues

Acknowledgments

License

Related Links