Skip to content

Decide on tease field policy: plain text vs markup #3755

@dannon

Description

@dannon

Context

PR #3752 added auto-generation of tease fields for ~850 older news articles that were missing them. The auto-generator strips markdown/HTML to produce plain text. Bjoern raised a good point: if we don't want markup in teases, we should make that an explicit policy rather than silently stripping it during preprocessing.

Currently there's no consistency — some teases have markdown links or emphasis, some are plain text, some are missing entirely. We should pick a direction and enforce it.

Option A: Teases must be plain text

For:

  • Teases appear in feeds (RSS/XML, JSON, Atom), search indexes, <meta> descriptions, and listing cards — most of these contexts don't render markdown, so markup would show up as raw syntax
  • Plain text is simpler to validate with a linter (just check for <, [, *, etc.)
  • The auto-generated teases from Fix duplicate content ID warnings from Astro glob-loader #3752 already follow this pattern and read well
  • Keeps the tease field predictable for any future consumer

Against:

  • Slightly more work for authors — can't just copy-paste a sentence from the body if it has a link or emphasis
  • Loses the ability to link project names or tools in tease text (though teases are typically too short for this to matter)

If we go this route:

  • Add a preprocess-time lint that warns (or errors) on teases containing markdown/HTML
  • Keep the auto-generator as-is for backfill
  • Add a CI check so new PRs with markup in teases get flagged

Option B: Teases can contain markdown

For:

  • Authors can write teases naturally without worrying about stripping formatting
  • Existing content with markdown teases doesn't need to be fixed
  • Some rendering contexts (listing pages, cards) could render inline markdown if we wanted

Against:

  • Feed consumers (RSS readers, JSON feed clients, external tools) would get raw markdown syntax in descriptions
  • Need to handle rendering in every place teases appear — some contexts render HTML, some don't
  • The auto-generator would need to preserve markup, making extraction more complex
  • Harder to lint — you'd need to distinguish "acceptable" inline markdown from problematic block-level elements

If we go this route:

  • Remove the markup stripping from generateTease()
  • Add a markdown-to-HTML render step in feed generators and other plain-text contexts
  • Accept the inconsistency or normalize all teases to include/exclude markup

What needs deciding

  1. Plain text or markdown in tease fields?
  2. Should missing teases be auto-generated (current behavior for articles >1 year old) or required via linting?
  3. Should we backfill the 3 remaining 2025 articles that are still missing teases?

Related: #3752

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions