Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENABLE_TEMPLATE_PROCESSING] BaseTemplateProcessor interface (process_template) is misused #28218

Open
3 tasks done
tseruga opened this issue Apr 25, 2024 · 1 comment
Open
3 tasks done
Assignees

Comments

@tseruga
Copy link

tseruga commented Apr 25, 2024

Bug description

Unsure of when this regression happened, but it likely happened part of #26476 or #27470

Within the BaseTemplateProcessor base class, we have the following method interface for process_template():

def process_template(self, sql: str, **kwargs: Any) -> str:
"""Processes a sql template
>>> sql = "SELECT '{{ datetime(2017, 1, 1).isoformat() }}'"
>>> process_template(sql)
"SELECT '2017-01-01T00:00:00'"
"""
template = self.env.from_string(sql)
kwargs.update(self._context)
context = validate_template_context(self.engine, kwargs)
return template.render(context)

Specifically, the type of sql should be a str.

It appears that this contract is being broken after one or more of the above changes. Instead, this method (when implemented and assigned to an engine) will sometimes be passed a jinja2.nodes.Template type object.

One example of this is here:

superset/superset/sql_parse.py

Lines 1524 to 1579 in 52f8734

def extract_tables_from_jinja_sql(sql: str, database: Database) -> set[Table]:
"""
Extract all table references in the Jinjafied SQL statement.
Due to Jinja templating, a multiphase approach is necessary as the Jinjafied SQL
statement may represent invalid SQL which is non-parsable by SQLGlot.
Firstly, we extract any tables referenced within the confines of specific Jinja
macros. Secondly, we replace these non-SQL Jinja calls with a pseudo-benign SQL
expression to help ensure that the resulting SQL statements are parsable by
SQLGlot.
:param sql: The Jinjafied SQL statement
:param database: The database associated with the SQL statement
:returns: The set of tables referenced in the SQL statement
:raises SupersetSecurityException: If SQLGlot is unable to parse the SQL statement
:raises jinja2.exceptions.TemplateError: If the Jinjafied SQL could not be rendered
"""
from superset.jinja_context import ( # pylint: disable=import-outside-toplevel
get_template_processor,
)
processor = get_template_processor(database)
template = processor.env.parse(sql)
tables = set()
for node in template.find_all(nodes.Call):
if isinstance(node.node, nodes.Getattr) and node.node.attr in (
"latest_partition",
"latest_sub_partition",
):
# Extract the table referenced in the macro.
tables.add(
Table(
*[
remove_quotes(part.strip())
for part in node.args[0].as_const().split(".")[::-1]
if len(node.args) == 1
]
)
)
# Replace the potentially problematic Jinja macro with some benign SQL.
node.__class__ = nodes.TemplateData
node.fields = nodes.TemplateData.fields
node.data = "NULL"
return (
tables
| ParsedQuery(
sql_statement=processor.process_template(template),
engine=database.db_engine_spec.engine,
).tables
)

We see that we grab the template processor, then convert the raw str to a Template object:

    processor = get_template_processor(database)
    template = processor.env.parse(sql)

This Template object is then passed to the process_template() method:


    return (
        tables
        | ParsedQuery(
            sql_statement=processor.process_template(template),
            engine=database.db_engine_spec.engine,
        ).tables
    )

Which breaks the contract set by the BaseTemplateProcessor interface.

The example implementation of a custom template processor also makes the assumption that the argument will always be a string as shown here:

https://github.com/apache/superset/blob/master/docs/docs/configuration/sql-templating.mdx?plain=1#L123-L149

I believe this example implementation will also break upon receiving a Template object instead of a string.

How to reproduce the bug

  1. Enable ENABLE_TEMPLATE_PROCESSING in config.py
  2. Implement a custom template processor extending the BaseTemplateProcessor
  3. Within process_template() perform a string operation
    Ex:
class ExampleTemplateProcessor(BaseTemplateProcessor):
    def process_template(self, sql: str, **kwargs: Any) -> str:
        return sql.upper()
  1. Wire up the new template processor an engine that you can query (I used sqllite since that's what the example data uses)
DEFAULT_PROCESSORS = {
    "presto": PrestoTemplateProcessor,
    "hive": HiveTemplateProcessor,
    "trino": TrinoTemplateProcessor,
    "sqlite": ExampleTemplateProcessor,
}

I added this to the list of DEFAULT_PROCESSORS for testing, but this also happens when using CUSTOM_TEMPLATE_PROCESSORS via the config.
5. Attempt to query in sql labs - this should work as expected:
6. Go to Sql Labs -> Query History
7. Observe that the template processor now fails since it is passed a Template object

2024-04-25 11:22:34,467:ERROR:superset.views.base:'Template' object has no attribute 'upper'
Traceback (most recent call last):
...
  File "...superset/sql_parse.py", line 1576, in extract_tables_from_jinja_sql
    sql_statement=processor.process_template(template),
  File "...superset/jinja_context.py", line 665, in process_template
    return sql.upper()
AttributeError: 'Template' object has no attribute 'upper'

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.10

Node version

16

Browser

Firefox

Additional context

Feature flag: ENABLE_TEMPLATE_PROCESSING

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@tseruga
Copy link
Author

tseruga commented Apr 26, 2024

Another finding that might help -
if ENABLE_TEMPLATE_PROCESSING is turned off, another breakage in the contract shows up here:

https://github.com/apache/superset/blob/master/superset/models/sql_lab.py#L72-L83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants