Q&A Copilot is the intelligent question-answering component of the Data-Juicer Agents system, a professional Data-Juicer AI assistant built on the AgentScope framework.
You can chat with our Q&A Copilot Juicer on the official documentation site of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem.
- Agent: Intelligent Q&A agent based on ReActAgent
- FAQ RAG System: Fast and accurate FAQ retrieval powered by Qdrant vector database and DashScope text embedding model
- MCP Integration: Online GitHub search capabilities through GitHub MCP Server
- Redis Storage: Supports session history and feedback data persistence
- Web API: Provides RESTful interfaces for frontend integration
- 3.10 <= Python <= 3.12
- Docker (for running Qdrant vector database)
- Redis server (optional, activated by
SESSION_STORE_TYPE=redis) - DashScope API Key (for large language model calls and text embedding)
-
Install dependencies
cd .. uv pip install .[qa] cd qa-copilot
-
Install Docker (for Qdrant vector database)
# Ubuntu/Debian sudo apt-get install docker.io sudo systemctl start docker # macOS brew install docker
Note: The system will automatically check and start the Qdrant Docker container on startup. If FAQ data is not initialized, the system will automatically read from
qa-copilot/rag_utils/faq.txtand initialize the RAG data. -
Install and start Redis (optional - skip if using the default
SESSION_STORE_TYPE=json)# Ubuntu/Debian sudo apt-get install redis-server redis-server --daemonize yes # macOS brew install redis brew services start redis
Note:
- If you set
SESSION_STORE_TYPE=json(default), session history will be stored as JSON files in theSESSION_STORE_DIRdirectory with automatic TTL-based cleanup. - If you set
SESSION_STORE_TYPE=redis, you need to have Redis server running. Session state is automatically managed by RedisMemory, and TTL is handled by Redis server configuration.
- If you set
-
Set required environment variables
export DASHSCOPE_API_KEY="your_dashscope_api_key" export GITHUB_TOKEN="your_github_token" # Required: for GitHub MCP integration
-
Set optional environment variables
Session Storage Configuration:
# Session store type: "json" (default) or "redis" export SESSION_STORE_TYPE="json" # or "redis" # For JSON mode (default): export SESSION_STORE_DIR="./sessions" # Session file storage directory (default: "./sessions") export SESSION_TTL_SECONDS="21600" # Session TTL in seconds (default: 21600 = 6 hours) export SESSION_CLEANUP_INTERVAL="1800" # Cleanup interval in seconds (default: 1800 = 30 minutes) # For Redis mode: export REDIS_HOST="localhost" # Redis server host (default: "localhost") export REDIS_PORT="6379" # Redis server port (default: 6379) export REDIS_DB="0" # Redis database number (default: 0) export REDIS_PASSWORD="" # Redis password (default: None, optional) export REDIS_MAX_CONNECTIONS="10" # Redis max connections (default: 10) # Note: Redis TTL is handled by Redis server configuration, not by application
Model Configuration:
export MAX_TOKENS="200000" # Maximum tokens for context window (default: 200000) # Note: This value is multiplied by 3 when passed to DashScopeChatFormatter # because CharTokenCounter counts characters, and ~3 chars ≈ 1 token for mixed CHN & ENG text
Qdrant Vector Database:
export QDRANT_HOST="127.0.0.1" # Qdrant server host (default: "127.0.0.1") export QDRANT_PORT="6333" # Qdrant server port (default: 6333)
Service Configuration:
export DJ_COPILOT_SERVICE_HOST="127.0.0.1" # Service host address (default: "127.0.0.1") export DJ_COPILOT_ENABLE_LOGGING="true" # Enable session logging (default: "true") export DJ_COPILOT_LOG_DIR="./logs" # Log directory (default: "./logs")
Advanced Configuration:
export FASTAPI_CONFIG_PATH="" # Path to FastAPI config JSON file (optional) export SAFE_CHECK_HANDLER_PATH="" # Path to custom safe check handler module (optional)
-
Configure FAQ file (optional)
The system uses
qa-copilot/rag_utils/faq.txtas the FAQ data source by default. You can edit this file to customize FAQ content. FAQ file format example:'id': 'FAQ_001', 'question': 'What is Data-Juicer?', 'answer': 'Data-Juicer is a...' 'id': 'FAQ_002', 'question': 'How to install?', 'answer': 'You can install by...' -
Start the service
bash setup_server.sh
On first startup, the system will automatically:
- Check and start the Qdrant Docker container (port 6333)
- Initialize FAQ RAG data (if not already initialized)
- Start the Web API service
After starting the service, the system provides the following API interfaces:
POST /process
Content-Type: application/json
{
"input": [
{
"role": "user",
"content": [{"type": "text", "text": "How to use Data-Juicer for data cleaning?"}]
}
],
"session_id": "your_session_id",
"user_id": "user_id"
}POST /memory
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}POST /clear
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}POST /feedback
Content-Type: application/json
{
"data": {
"message_id": "message_id_here",
"feedback_type": "like",
"comment": "optional user comment"
},
"session_id": "your_session_id",
"user_id": "user_id"
}Parameters:
message_id: The ID of the message to provide feedback on (required)feedback_type: Type of feedback, either"like"or"dislike"(required)comment: Optional user comment text (optional)
Response example:
{
"status": "ok",
"message": "Feedback recorded successfully"
}you can simply run the following command in your terminal:
npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/processRefer to AgentScope Runtime WebUI for more information.
| Variable | Required | Default | Description |
|---|---|---|---|
DASHSCOPE_API_KEY |
✅ Yes | - | DashScope API key for LLM and embedding |
GITHUB_TOKEN |
✅ Yes | - | GitHub token for MCP integration |
SESSION_STORE_TYPE |
❌ No | "json" |
Session storage type: "json" or "redis" |
SESSION_STORE_DIR |
❌ No | "./sessions" |
Session file directory (JSON mode only) |
SESSION_TTL_SECONDS |
❌ No | 21600 |
Session TTL in seconds (JSON mode only, 6 hours) |
SESSION_CLEANUP_INTERVAL |
❌ No | 1800 |
Cleanup interval in seconds (JSON mode only, 30 minutes) |
REDIS_HOST |
❌ No | "localhost" |
Redis server host (Redis mode only) |
REDIS_PORT |
❌ No | 6379 |
Redis server port (Redis mode only) |
REDIS_DB |
❌ No | 0 |
Redis database number (Redis mode only) |
REDIS_PASSWORD |
❌ No | None |
Redis password (Redis mode only, optional) |
REDIS_MAX_CONNECTIONS |
❌ No | 10 |
Redis max connections (Redis mode only) |
QDRANT_HOST |
❌ No | "127.0.0.1" |
Qdrant server host |
QDRANT_PORT |
❌ No | 6333 |
Qdrant server port |
MAX_TOKENS |
❌ No | 200000 |
Maximum tokens for context window (multiplied by 3 for CharTokenCounter) |
DJ_COPILOT_SERVICE_HOST |
❌ No | "127.0.0.1" |
Service host address |
DJ_COPILOT_ENABLE_LOGGING |
❌ No | "true" |
Enable session logging |
DJ_COPILOT_LOG_DIR |
❌ No | "./logs" |
Log directory |
FASTAPI_CONFIG_PATH |
❌ No | "" |
Path to FastAPI config JSON file |
SAFE_CHECK_HANDLER_PATH |
❌ No | "" |
Path to custom safe check handler |
In app_deploy.py, you can configure the language model to use:
model=DashScopeChatModel(
"qwen3-max-2026-01-23", # Model name
api_key=os.getenv("DASHSCOPE_API_KEY"),
stream=True, # Enable streaming response
enable_thinking=True, # Enable thinking mode
)The formatter uses MAX_TOKENS environment variable (default: 200000) to limit the context window size. Since CharTokenCounter counts characters and approximately 3 characters ≈ 1 token for mixed Chinese and English text, the value is multiplied by 3 when passed to DashScopeChatFormatter.
JSON Mode (Default):
- Session history is stored as JSON files in
SESSION_STORE_DIRdirectory - Automatic TTL-based cleanup runs every
SESSION_CLEANUP_INTERVALseconds - Sessions expire after
SESSION_TTL_SECONDSseconds of inactivity - No external dependencies required
Redis Mode:
- Session history is stored in Redis
- Session state is automatically managed by
RedisMemory - TTL is handled by Redis server configuration (not application-level)
- Requires Redis server to be running
The FAQ RAG system uses the following configuration:
- Vector Database: Qdrant (running in Docker container)
- Embedding Model: DashScope text-embedding-v4
- Vector Dimension: 1024
- Data Source:
qa-copilot/rag_utils/faq.txt - Storage Location:
qa-copilot/rag_utils/qdrant_storage - Qdrant Host: Configurable via
QDRANT_HOST(default:127.0.0.1) - Qdrant Port: Configurable via
QDRANT_PORT(default:6333)
The system automatically checks if RAG data is initialized on startup. If not initialized, it will automatically read the FAQ file and create vector indexes.
-
Docker/Qdrant Issues
- Ensure Docker service is running:
docker --version - Check Qdrant container status:
docker ps | grep qdrant - Manually start Qdrant container:
docker start qdrant - Check if Qdrant port is occupied:
netstat -tlnp | grep 6333 - To reinitialize RAG data, delete the
qa-copilot/rag_utils/qdrant_storagedirectory and restart the service
- Ensure Docker service is running:
-
Redis connection failure (when using
SESSION_STORE_TYPE=redis)- Ensure Redis service is running:
redis-cli ping - Check if Redis port is occupied:
netstat -tlnp | grep 6379(or your configuredREDIS_PORT) - Verify Redis configuration: Check
REDIS_HOST,REDIS_PORT,REDIS_DB, andREDIS_PASSWORDenvironment variables - Note: Redis TTL is managed by Redis server, not by the application
- Ensure Redis service is running:
-
MCP service startup failure
- Ensure
GITHUB_TOKENis set and correct (required environment variable) - Verify GitHub token has necessary permissions for MCP integration
- Ensure
-
API Key error
- Verify
DASHSCOPE_API_KEYenvironment variable is correctly set - Confirm API Key is valid and has sufficient quota
- Verify
-
FAQ retrieval returns no results
- Confirm FAQ file
qa-copilot/rag_utils/faq.txtexists and is properly formatted - Check if Qdrant container is running normally
- Review logs to confirm RAG data was successfully initialized
- Confirm FAQ file
Parts of this project's code are adapted from the following open-source projects:
- FAQ RAG System & GitHub MCP Integration: Adapted from the implementation in AgentScope Samples - Alias
Special thanks to the AgentScope team for their excellent framework and sample code!
This project uses the same license as the main project. For details, please refer to the LICENSE file.
