Skip to content

Deduplicate generated schema artifacts using pydantic:module annotation #167

@jmchilton

Description

@jmchilton

Context

The schema_dedup branch moved shared types (creators, ToolShedRepository, StepPosition, HasUUID, etc.) into schema/common/common.yml — but the codegen still expands them inline into every generated module. This means gxformat2.py and native.py each contain their own copy of every common type, and _conversion.py has duplicate conversion functions to marshal between structurally-identical-but-different classes.

Proposal

Add a pydantic:module annotation to schema-salad-plus-pydantic so types can declare which Python module they belong to. When generating a module, types annotated with a different pydantic:module emit an import instead of a class definition.

Schema side

# common.yml
- name: StepPosition
  type: record
  pydantic:module: "gxformat2.schema.common"
  fields: ...

CLI side

New --module-name arg so the generator knows which module it's building:

# Generate common module
schema-salad-plus-pydantic generate schema/common/common.yml \
    --module-name gxformat2.schema.common -o gxformat2/schema/common.py

# Format-specific modules import common types instead of inlining
schema-salad-plus-pydantic generate schema/v19_09/workflow.yml \
    --module-name gxformat2.schema.gxformat2 -o gxformat2/schema/gxformat2.py

Application side

  • Import common types from gxformat2.schema.common instead of aliased duplicates from both modules
  • Collapse duplicate conversion functions (_convert_creators / _convert_creators_to_native, _convert_tool_shed_repo_to_format2 / _convert_tool_shed_repo_to_native)

Open questions

  • Abstract types in common module — emit as abstract BaseModel or skip?
  • Strict variants — separate common_strict.py or share one common module?
  • load_document() / _load_single() require document roots — common module should skip (library mode)?
  • pydantic: namespace declaration in common.yml (currently only declared in importing workflow.yml files)
  • Traditional codegen (v19_09.py, native_v0_1.py) — leave as-is or drop? Nothing imports them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions