Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cubesql) Implement format / col_description #8947

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 160 additions & 11 deletions rust/cubesql/cubesql/src/compile/engine/udf/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3087,10 +3087,17 @@
Some(as_str) => {
match PgType::get_all().iter().find(|e| e.typname == as_str) {
None => {
return Err(DataFusionError::Execution(format!(
"Unable to cast expression to Regclass: Unknown type: {}",
as_str
)))
// If the type name contains a dot, it's a schema-qualified name
// and we should return return the approprate RegClass to be converted to OID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: return return

// For now, we'll return 0 so metabase can sync without failing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add TODO here along the lines of "implement something"

if as_str.contains('.') {
builder.append_value(0)?;
} else {
return Err(DataFusionError::Execution(format!(
"Unable to cast expression to Regclass: Unknown type: {}",
as_str
)));

Check warning on line 3099 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3096-L3099

Added lines #L3096 - L3099 were not covered by tests
}
}
Some(ty) => {
builder.append_value(ty.oid as i64)?;
Expand Down Expand Up @@ -3148,6 +3155,155 @@
)
}

// Return a NOOP for this so metabase can sync without failing
// TODO: Implement this
pub fn create_col_description_udf() -> ScalarUDF {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please attach links to respective parts of PostgreSQL documentation?

Like https://www.postgresql.org/docs/17/functions-info.html#FUNCTIONS-INFO-COMMENT here, and https://www.postgresql.org/docs/17/functions-string.html#FUNCTIONS-STRING-FORMAT for format, maybe other important references.

let fun = make_scalar_function(move |args: &[ArrayRef]| {
// Ensure the output array has the same length as the input
let input_length = args[0].len();
let mut builder = StringBuilder::new(input_length);

for _ in 0..input_length {
builder.append_null()?;
}

Ok(Arc::new(builder.finish()) as ArrayRef)
});

let return_type: ReturnTypeFunction = Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));

ScalarUDF::new(
"col_description",
&Signature::one_of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this Signature::one_of? It matches single type signature, why not use Signature::exact directly?

vec![TypeSignature::Exact(vec![DataType::Utf8, DataType::Utf8])],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col_description is documented as ( table oid, column integer ) → text,
so I expect that argument types here would be ints, not strings.

You can check against pg_description system table provider for this
https://github.com/cube-js/cube/blob/master/rust/cubesql/cubesql/src/compile/engine/information_schema/postgres/pg_description.rs

First argument would be used to filter objoid, and second - objsubid

Volatility::Immutable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col_description is not immutable, as it can change values for same inputs in a different queries: column comment can change between queries.
But it cannot change values for a single query (I think, assuming PostgreSQL isolation), so it should be Volatility::Stable.

),
&return_type,
&fun,
)
}

pub fn create_format_udf() -> ScalarUDF {
let fun = make_scalar_function(move |args: &[ArrayRef]| {
// Ensure at least one argument is provided
if args.is_empty() {
return Err(DataFusionError::Execution(
"format() requires at least one argument".to_string(),
));

Check warning on line 3192 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3190-L3192

Added lines #L3190 - L3192 were not covered by tests
}

// Ensure the first argument is a Utf8 (string)
if args[0].data_type() != &DataType::Utf8 {
return Err(DataFusionError::Execution(
"format() first argument must be a string".to_string(),
));

Check warning on line 3199 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3197-L3199

Added lines #L3197 - L3199 were not covered by tests
}

let format_strings = downcast_string_arg!(&args[0], "format_str", i32);
let mut builder = StringBuilder::new(format_strings.len());

for i in 0..format_strings.len() {
if format_strings.is_null(i) {
builder.append_null()?;
continue;

Check warning on line 3208 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3207-L3208

Added lines #L3207 - L3208 were not covered by tests
}

let format_str = format_strings.value(i);
let mut result = String::new();
let mut format_chars = format_str.chars().peekable();
let mut arg_index = 1; // Start from first argument after format string

while let Some(c) = format_chars.next() {
if c != '%' {
result.push(c);
continue;
}

match format_chars.next() {
Some('I') => {
// Handle %I - SQL identifier
if arg_index >= args.len() {
return Err(DataFusionError::Execution(
"Not enough arguments for format string".to_string(),
));

Check warning on line 3228 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3226-L3228

Added lines #L3226 - L3228 were not covered by tests
}

let arg = &args[arg_index];
let value = match arg.data_type() {
DataType::Utf8 => {
let str_arr = downcast_string_arg!(arg, "arg", i32);
if str_arr.is_null(i) {
return Err(DataFusionError::Execution(
"NULL values cannot be formatted as identifiers"
.to_string(),
));

Check warning on line 3239 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3236-L3239

Added lines #L3236 - L3239 were not covered by tests
}
str_arr.value(i).to_string()
}
_ => {
// For other types, try to convert to string
let str_arr = cast(&arg, &DataType::Utf8)?;
let str_arr =
str_arr.as_any().downcast_ref::<StringArray>().unwrap();
if str_arr.is_null(i) {
return Err(DataFusionError::Execution(
"NULL values cannot be formatted as identifiers"
.to_string(),
));
}
str_arr.value(i).to_string()

Check warning on line 3254 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3245-L3254

Added lines #L3245 - L3254 were not covered by tests
}
};

// Quote identifier if necessary
let needs_quoting = !value
.chars()
.all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit wrong, because identifiers 1 and 1a need quoting, but a1 - does not.

|| value.is_empty();

if needs_quoting {
result.push('"');
result.push_str(&value.replace('"', "\"\""));
result.push('"');
} else {
result.push_str(&value);
}
arg_index += 1;
}
Some('%') => {
// %% is escaped to single %
result.push('%');
}

Check warning on line 3276 in rust/cubesql/cubesql/src/compile/engine/udf/common.rs

View check run for this annotation

Codecov / codecov/patch

rust/cubesql/cubesql/src/compile/engine/udf/common.rs#L3273-L3276

Added lines #L3273 - L3276 were not covered by tests
Some(c) => {
return Err(DataFusionError::Execution(format!(
"Unsupported format specifier %{}",
c
)));
}
None => {
return Err(DataFusionError::Execution(
"Invalid format string - ends with %".to_string(),
));
}
}
}

builder.append_value(result)?;
}

Ok(Arc::new(builder.finish()) as ArrayRef)
});

let return_type: ReturnTypeFunction = Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));

ScalarUDF::new(
"format",
&Signature::variadic(vec![DataType::Utf8], Volatility::Immutable),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format is documented like this
format(formatstr text [, formatarg "any" [, ...] ])

Signature::variadic([Utf8], ...) will accept any number of arguments, but each argument must be one of [Utf8]

Unfortunately, there's no proper way to represent it in TypeSignature right now. But current variant would just cast everything to Utf8 before calling format, which is ok for now, I think. More recent DataFusion versions have extended TypeSignature to support this case, so we can deal with this later.

Please comment this part: about casting and TODO about incorrect signature

&return_type,
&fun,
)
}

pub fn create_json_build_object_udf() -> ScalarUDF {
let fun = make_scalar_function(move |_args: &[ArrayRef]| {
// TODO: Implement
Expand Down Expand Up @@ -3769,13 +3925,6 @@
rettyp = TimestampTz,
vol = Volatile
);
register_fun_stub!(
udf,
"col_description",
tsig = [Oid, Int32],
rettyp = Utf8,
vol = Stable
);
register_fun_stub!(udf, "convert", tsig = [Binary, Utf8, Utf8], rettyp = Binary);
register_fun_stub!(udf, "convert_from", tsig = [Binary, Utf8], rettyp = Utf8);
register_fun_stub!(udf, "convert_to", tsig = [Utf8, Utf8], rettyp = Binary);
Expand Down
54 changes: 54 additions & 0 deletions rust/cubesql/cubesql/src/compile/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16333,4 +16333,58 @@ LIMIT {{ limit }}{% endif %}"#.to_string(),
.sql;
assert!(sql.contains(" IS NULL DESC, "));
}

#[tokio::test]
async fn test_format_function() -> Result<(), CubeError> {
// Test: Basic usage with a single identifier
let result = execute_query(
"SELECT format('%I', 'column_name') AS formatted_identifier".to_string(),
DatabaseProtocol::PostgreSQL,
)
.await?;
insta::assert_snapshot!("formatted_identifier", result);

// Test: Using multiple identifiers
let result = execute_query(
"SELECT format('%I, %I', 'table_name', 'column_name') AS formatted_identifiers"
.to_string(),
DatabaseProtocol::PostgreSQL,
)
.await?;
insta::assert_snapshot!("formatted_identifiers", result);

// Test: Unsupported format specifier
let result = execute_query(
"SELECT format('%X', 'value') AS unsupported_specifier".to_string(),
DatabaseProtocol::PostgreSQL,
)
.await;
assert!(result.is_err());

// Test: Format string ending with %
let result = execute_query(
"SELECT format('%', 'value') AS invalid_format".to_string(),
DatabaseProtocol::PostgreSQL,
)
.await;
assert!(result.is_err());

// Test: Quoting necessary for special characters
let result = execute_query(
"SELECT format('%I', 'column-name') AS quoted_identifier".to_string(),
DatabaseProtocol::PostgreSQL,
)
.await?;
insta::assert_snapshot!("quoted_identifier", result);

// Test: Quoting necessary for reserved keywords
let result = execute_query(
"SELECT format('%I', 'select') AS quoted_keyword".to_string(),
DatabaseProtocol::PostgreSQL,
)
.await?;
insta::assert_snapshot!("quoted_keyword", result);

Ok(())
}
}
2 changes: 2 additions & 0 deletions rust/cubesql/cubesql/src/compile/query_engine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,9 @@ impl QueryEngine for SqlQueryEngine {
ctx.register_udf(create_current_timestamp_udf("localtimestamp"));
ctx.register_udf(create_current_schema_udf());
ctx.register_udf(create_current_schemas_udf());
ctx.register_udf(create_format_udf());
ctx.register_udf(create_format_type_udf());
ctx.register_udf(create_col_description_udf());
ctx.register_udf(create_pg_datetime_precision_udf());
ctx.register_udf(create_pg_numeric_precision_udf());
ctx.register_udf(create_pg_numeric_scale_udf());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
source: cubesql/src/compile/mod.rs
expression: result
---
+----------------------+
| formatted_identifier |
+----------------------+
| column_name |
+----------------------+
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
source: cubesql/src/compile/mod.rs
expression: result
---
+-------------------------+
| formatted_identifiers |
+-------------------------+
| table_name, column_name |
+-------------------------+
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
source: cubesql/src/compile/mod.rs
expression: result
---
+-------------------+
| quoted_identifier |
+-------------------+
| "column-name" |
+-------------------+
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
source: cubesql/src/compile/mod.rs
expression: result
---
+----------------+
| quoted_keyword |
+----------------+
| select |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL actually wraps keywords with quotes, and accepts keywords unwrapped only in assigning column alias position, but does not accept those in subquery alias or in column expression.

Maybe just wrap any %I argument with quotes? I can't imagine right now what could go wrong, in a sense that format argument can already be quoted in result, so any consumer should handle it as well. That would differ from PostgreSQL behavior, of course. But current implementation also differs, and "quote everything" approach have a benefit of something like "any %I substitution would make a proper SQL".

+----------------+
Loading
Loading