Skip to content

Conversation

@linhongliu-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds the complete serialization/deserialization infrastructure for parsing metric view YAML definitions:

  • Add Jackson YAML dependencies to pom.xml
  • Implement canonical model for metric views:
    • Column, Expression (Dimension/Measure), MetricView, Source
    • YAMLVersion validation and exception types
  • Implement version-specific serde (v0.1):
    • YAML deserializer/serializer
    • Base classes for extensibility
  • Add JSON utilities for metadata serialization

Why are the changes needed?

SPIP: Metrics & semantic modeling in Spark

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "catalyst/testOnly org.apache.spark.sql.metricview.serde.MetricViewFactorySuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code
Co-Authored-By: Claude [email protected]

… for metric views

This commit adds the complete serialization/deserialization infrastructure
for parsing metric view YAML definitions:

- Add Jackson YAML dependencies to pom.xml
- Implement canonical model for metric views:
  - Column, Expression (Dimension/Measure), MetricView, Source
  - YAMLVersion validation and exception types
- Implement version-specific serde (v0.1):
  - YAML deserializer/serializer
  - Base classes for extensibility
- Add JSON utilities for metadata serialization
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>${fasterxml.jackson.version}</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all jackson deps are managed by jackson-bom now, you don't need to declare it again here


object JsonUtils {
// Singleton ObjectMapper that can be used across the project
private lazy val mapper: ObjectMapper = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comments, should we try to use this singleton as much as possible in the subsequent development?

// Only parse the "version" field and ignore all others
@JsonIgnoreProperties(ignoreUnknown = true)
case class YAMLVersion(version: String) extends Validatable {
def validYAMLVersions: Set[String] = Set("0.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def validYAMLVersions: Set[String] = Set("0.1")
private def validYAMLVersions: Set[String] = Set("0.1")


val mapper = new ObjectMapper(yamlFactory)
// Exclude null values from serialized output
.setSerializationInclusion(JsonInclude.Include.NON_NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.setSerializationInclusion(JsonInclude.Include.NON_NULL)
. setDefaultPropertyInclusion(JsonInclude.Include.NON_NULL)

// Exclude null values from serialized output
.setSerializationInclusion(JsonInclude.Include.NON_NULL)
// Exclude empty collections/strings from serialized output
.setSerializationInclusion(JsonInclude.Include.NON_EMPTY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.setSerializationInclusion(JsonInclude.Include.NON_EMPTY)
. setDefaultPropertyInclusion(JsonInclude.Include.NON_EMPTY)

) extends ColumnBase

object ColumnV01 {
def fromCanonical(canonical: Column[_ <: Expression], ordinal: Int): ColumnV01 = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ordinal mandatory?

Try(CatalystSqlParser.parseQuery(sourceText)) match {
case Success(_) => SQLSource(sourceText)
case Failure(queryEx) =>
throw queryEx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should queryEx be wrapped into MetricViewValidationException?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants