Skip to content

Schema Refresh Fails for Certain Datasource Types #7571

@snickerjp

Description

@snickerjp

Problem Description

Schema refresh execution for certain datasource types (results, python, etc.) causes NotSupported exceptions, generating error logs and metrics.

Steps to Reproduce

  1. Create a datasource of type Query Results or Python
  2. Wait for the scheduler to execute schema refresh (default 30-minute interval)
  3. Check worker logs

How to check logs:

# For Docker Compose
docker compose logs -f worker

# Or manually execute schema refresh
docker compose exec worker python -c "
from redash import create_app
from redash.tasks.queries.maintenance import refresh_schemas
app = create_app()
with app.app_context():
    refresh_schemas()
"

Actual Behavior (Error Logs)

[WARNING] Failed refreshing schema for the data source: Query Results
Traceback (most recent call last):
  File "/app/redash/tasks/queries/maintenance.py", line 166, in refresh_schema
    ds.get_schema(refresh=True)
  File "/app/redash/models/__init__.py", line 217, in get_schema
    schema = query_runner.get_schema(get_stats=refresh)
  File "/app/redash/query_runner/__init__.py", line 232, in get_schema
    raise NotSupported()
redash.query_runner.NotSupported
[INFO] task=refresh_schema state=failed ds_id=1 runtime=0.00

[WARNING] Failed refreshing schema for the data source: python
Traceback (most recent call last):
  ...
redash.query_runner.NotSupported
[INFO] task=refresh_schema state=failed ds_id=2 runtime=0.00

Expected Behavior

These datasource types don't have the concept of schema, so they should be excluded from the schema refresh process. Error logs and metrics should not be recorded.

Impact

  • Unnecessary error logs are recorded in large quantities
  • refresh_schema.error metrics become inaccurate
  • Log readability decreases
  • Potential wasteful endpoint access

Environment

  • Redash Version: 25.11.0-dev (master branch)
  • Affected datasource types:
    • results (Query Results)
    • python (Python)
    • Other types that don't implement get_schema method

Proposed Solution

Allow specifying datasource types to exclude from schema refresh via environment variable:

REDASH_SCHEMAS_REFRESH_EXCLUDED_TYPES=results,python

By excluding results and python by default, the issue will be automatically resolved in existing environments.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions