Skip to content

fix:simplify-metadata-logic #1485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 26, 2025
Merged

fix:simplify-metadata-logic #1485

merged 3 commits into from
Mar 26, 2025

Conversation

cpengilly
Copy link
Contributor

  • Remove auto-fixing functionality to reduce code size & number of scripts (e.g., reduced from 5 scripts down to 2)
  • Keeps suggestion functionality intact
  • Remove hardcoded categories in favor of single source of truth from keywords.config.yaml
  • Update docs/notes to reflect simplified workflow
  • Update code rabbit responses to point users to keywords.config.yaml for valid options

This change makes the metadata system more maintainable by:

  1. Simplifying the validation process
  2. Centralizing all valid options in keywords.config.yaml
  3. Making the system easier to maintain for content writers

- Remove auto-fixing functionality to reduce code size & number of scripts (e.g., reduced from 5 scripts down to 2)
- Keeps suggestion functionality intact
- Remove hardcoded categories in favor of single source of truth from keywords.config.yaml
- Update docs/notes to reflect simplified workflow
- Update code rabbit responses to point users to keywords.config.yaml for valid options

This change makes the metadata system more maintainable by:
1. Simplifying the validation process
2. Centralizing all valid options in keywords.config.yaml
3. Making the system easier to maintain for content writers
@cpengilly cpengilly requested a review from a team as a code owner March 8, 2025 02:47
Copy link

netlify bot commented Mar 8, 2025

Deploy Preview for docs-optimism ready!

Name Link
🔨 Latest commit 60aa2d4
🔍 Latest deploy log https://app.netlify.com/sites/docs-optimism/deploys/67d05a69b51ee00008deb10a
😎 Deploy Preview https://deploy-preview-1485--docs-optimism.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@cpengilly cpengilly requested review from jvmi7 and krofax March 8, 2025 02:50
Copy link
Contributor

coderabbitai bot commented Mar 8, 2025

📝 Walkthrough

Walkthrough

This pull request refines the metadata validation system for MDX files. The changes include updates to configuration files such as .coderabbit.yaml and keywords.config.yaml to provide clearer instructions and support additional categories, including a new required field is_imported_content. The metadata validation flow has been simplified by replacing the old CLI methods with a new validator utility (metadata-validator.ts), which now directly reads file content and validates frontmatter via an updated function signature in metadata-manager.ts. Additionally, obsolete scripts related to metadata-batch-cli have been removed from package.json, and the asynchronous configuration loading in metadata-types.ts has been enhanced. The overall update shifts focus from automated metadata fixing to manual verification supported by more streamlined validation tooling.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Metadata Validator CLI
    participant Env as Environment (CHANGED_FILES)
    participant FS as File System
    participant Validator as validateMetadata (from metadata-manager.ts)
    participant Logger as Logger/Console

    CLI->>Env: Retrieve CHANGED_FILES
    CLI->>FS: Read each .mdx file content
    CLI->>Validator: Call validateMetadata(filepath, content)
    Validator-->>CLI: Return validation results
    CLI->>Logger: Log file status (success/errors)
    alt Errors Found
        CLI->>Logger: Output error details and suggestions
        CLI->>CLI: Exit with status code 1
    else All Valid
        CLI->>Logger: Log success message
        CLI->>CLI: Exit with status code 0
    end
Loading

Possibly related PRs

  • fix metadata #1440: The changes in the main PR regarding the validation process and instructions for handling metadata are related to the modifications in the retrieved PR, which also involves updates to the validate-metadata script and its execution context, indicating a shared focus on metadata validation.
  • fix-metadata-validation #1418: The changes in the main PR focus on updating instructions and validation commands related to metadata handling, while the retrieved PR modifies error handling and validation processes in the utils/metadata-manager.ts file, indicating a direct connection in their focus on metadata validation improvements.
  • logic updates + superchain directory #1434: The changes in the main PR regarding the .coderabbit.yaml file's instructions for handling metadata are related to the updates in the keywords.config.yaml file in the retrieved PR, as both involve modifications to metadata handling and validation processes.

Suggested reviewers

  • bradleycamacho
  • krofax
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
keywords.config.yaml (1)

88-88: Fix trailing whitespace.

Several lines in the file contain trailing whitespace, which should be removed to maintain clean code.

Apply this diff to remove trailing whitespace:

-      
+
-      op-node        
+      op-node
-      dependency-set  
+      dependency-set
-      
+
-      
+
-      
+

Also applies to: 93-93, 166-166, 193-193, 225-225, 282-282

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 88-88: trailing spaces

(trailing-spaces)

utils/metadata-validator.ts (1)

24-26: Consider reducing verbose logging.

The script outputs verbose debugging messages that might clutter the console output during normal operation.

Consider removing or conditionalizing these messages behind a verbose flag:

-        console.log('File content loaded')
         const result = await validateMetadata(file, content)
-        console.log('Validation completed')
notes/metadata-update.md (1)

12-12: Fix grammatical issue by adding the missing article “the”.

Consider inserting "the" before "is_imported_content flag" to make the sentence clearer.

- - Handles duplicate/imported pages with `is_imported_content` flag
+ - Handles duplicate/imported pages with the `is_imported_content` flag
🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: You might be missing the article “the” here.
Context: ...- Handles duplicate/imported pages with is_imported_content flag - No automatic...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

utils/metadata-manager.ts (3)

23-26: Handle potential parsing errors from gray-matter.

If matter(content) fails on malformed YAML, it might raise an exception or return incomplete data. Consider wrapping it in a try-catch to provide more informative error handling.


31-33: Validate for empty strings.

Currently, only absent fields trigger errors. Consider also checking if fields like title, description, and topic are non-empty strings, as empty values can be just as invalid.


69-75: Consider using a boolean instead of string for is_imported_content.

Relying on "true"/"false" as strings can cause confusion. A boolean approach would be more explicit and less prone to parsing errors.

utils/types/metadata-types.ts (1)

5-20: Guard global config loading for concurrent calls.

Since loadConfig sets a global variable, multiple concurrent invocations of loadConfig() could lead to race conditions. Consider adding a locking mechanism or using a single initialization flow to avoid inconsistent states.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74159d0 and 45e92aa.

⛔ Files ignored due to path filters (1)
  • dist/tsconfig.tsbuildinfo is excluded by !**/dist/**
📒 Files selected for processing (9)
  • .coderabbit.yaml (1 hunks)
  • keywords.config.yaml (7 hunks)
  • notes/metadata-update.md (1 hunks)
  • package.json (2 hunks)
  • utils/metadata-analyzer.ts (0 hunks)
  • utils/metadata-batch-cli.ts (0 hunks)
  • utils/metadata-manager.ts (8 hunks)
  • utils/metadata-validator.ts (1 hunks)
  • utils/types/metadata-types.ts (4 hunks)
💤 Files with no reviewable changes (2)
  • utils/metadata-batch-cli.ts
  • utils/metadata-analyzer.ts
🧰 Additional context used
🪛 YAMLlint (1.35.1)
keywords.config.yaml

[error] 88-88: trailing spaces

(trailing-spaces)


[error] 93-93: trailing spaces

(trailing-spaces)


[error] 166-166: trailing spaces

(trailing-spaces)


[error] 193-193: trailing spaces

(trailing-spaces)


[error] 225-225: trailing spaces

(trailing-spaces)


[error] 282-282: trailing spaces

(trailing-spaces)

🪛 LanguageTool
notes/metadata-update.md

[uncategorized] ~12-~12: You might be missing the article “the” here.
Context: ...- Handles duplicate/imported pages with is_imported_content flag - No automatic...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-optimism
  • GitHub Check: Header rules - docs-optimism
  • GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (20)
keywords.config.yaml (9)

55-55: Category limit increased to support expanded taxonomy.

The maximum number of categories allowed for a document has been increased from 5 to 8, which will provide more flexibility for comprehensive categorization of content.


68-70: Expanded Data Availability Layer categories.

These new data availability related categories provide more granularity for classifying content in this domain, which aligns with the PR objective of centralizing taxonomy.


78-88: Enhanced Sequencing Layer categorization.

The addition of these specific categories (batcher configuration, outages, optimism portal, etc.) enables more precise content classification related to sequencing operations.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 88-88: trailing spaces

(trailing-spaces)


90-108: Improved Derivation Layer taxonomy.

The expanded categories for the Derivation Layer provide a more comprehensive classification system for technical documentation in this area.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 93-93: trailing spaces

(trailing-spaces)


124-144: Enhanced Execution Layer categorization.

These additions provide more granular classification options for execution-related documentation, improving content discoverability.


148-193: Expanded Settlement Layer categories.

The significant expansion of settlement-related categories allows for more precise classification of bridge, token, messaging, and transaction finality documentation.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 166-166: trailing spaces

(trailing-spaces)


[error] 193-193: trailing spaces

(trailing-spaces)


196-224: New Governance Layer categories.

These additions ensure comprehensive coverage of governance-related topics including security roles, voting mechanisms, and multisig operations.


227-252: Enhanced Cross-layer Development Tools categories.

The expanded categories for development tooling will help classify documentation across architectural boundaries and deployment scenarios.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 252-252: trailing spaces

(trailing-spaces)


339-347:

❓ Verification inconclusive

New required field for imported content.

The addition of the is_imported_content field as a required metadata attribute is a significant change that will help distinguish between original and imported/duplicate content.

This field will need to be added to all existing content, which aligns with the PR objective of centralizing taxonomy management.


🏁 Script executed:

#!/bin/bash
# Count how many existing MDX files will need to be updated with the new required field
echo "Checking how many files will need the new is_imported_content field..."
total_mdx_files=$(fd '\.mdx$' | wc -l)
files_with_field=$(fd '\.mdx$' -x grep -l "is_imported_content:" {} | wc -l)
files_needing_update=$((total_mdx_files - files_with_field))
echo "Total MDX files: $total_mdx_files"
echo "Files already with is_imported_content field: $files_with_field"
echo "Files needing update: $files_needing_update"

Length of output: 525


Metadata Field Addition: Confirmed & Verification Required

The new is_imported_content field in keywords.config.yaml (lines 339–347) is correctly implemented as a required attribute to distinguish between original and imported/duplicate content. Our verification script confirms that out of 211 MDX files, 83 already include this field while 128 files will need to be updated. Please ensure that the metadata update is applied consistently across all existing content to align with the PR’s taxonomy centralization objective.

.coderabbit.yaml (1)

36-39: Replaced auto-fixing with manual validation approach.

The instruction has been updated to direct users to manually add required metadata fields by consulting the keywords.config.yaml file instead of attempting to automatically fix the fields. The validation command has been simplified to pnpm validate-metadata.

This change aligns with the PR objective of simplifying the metadata workflow by removing the auto-fixing functionality while maintaining the validation capability.

utils/metadata-validator.ts (1)

1-64: Well-structured new metadata validation utility.

This new script implements a streamlined approach to metadata validation by:

  1. Loading configuration asynchronously
  2. Checking all changed MDX files
  3. Validating metadata for each file
  4. Providing clear error messages with suggestions for valid values

The implementation follows good practices with proper error handling, async/await patterns, and helpful user feedback.

package.json (2)

8-8: Removed metadata auto-fixing from the fix script.

The fix script has been simplified by removing the metadata-batch-cli command, aligning with the PR objective of moving away from auto-fixing metadata.


18-19: Updated metadata validation scripts.

The validation scripts have been updated to use the new metadata-validator.ts utility instead of the previous batch CLI approach. This change simplifies the validation workflow and aligns with the PR's objective of streamlining the metadata management process.

utils/metadata-manager.ts (5)

79-84: Looks good.

Returning valid options as suggestions is a helpful addition for guiding users.


145-146: No concerns here.

The call to validateMetadata is consistent with the new function signature.


154-156: No concerns here.

Exposing suggested enums is a clear way to help users correct invalid metadata.


177-177: Looks good.

Reusing configuration for suggestions during error handling is consistent with the rest of the approach.

Also applies to: 182-185


322-322: No concerns here.

Invoking validateMetadata is appropriate for ensuring consistency with the updated flow.

utils/types/metadata-types.ts (2)

125-133: Looks good.

The expanded ValidationResult structure for valid suggestions aligns well with the updated metadata requirements.


135-140: No concerns here.

getConfig properly ensures the config has been loaded, preventing invalid usage.

Copy link
Contributor

@jvmi7 jvmi7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, probably good to merge though from a first glance

adress coderabbit + imran feedback:
- Added generateSuggestions helper function to reduce code duplication
- Updated validation to check for:
  - Minimum number of personas
  - Minimum/maximum categories
  - Duplicate categories
- Simplified validate-metadata script to use only staged files
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
utils/metadata-validator.ts (1)

64-65: ⚠️ Potential issue

Remove extra closing brackets at the end of the file.

There are extraneous closing brackets that don't match any opening brackets.

Apply this diff to fix the syntax error:

main().catch(error => {
  console.error('Unhandled error:', error)
  process.exit(1)
})
-})
-
utils/metadata-manager.ts (1)

95-100: ⚠️ Potential issue

Remove duplicated validation code.

This block duplicates the same validation logic for is_imported_content that already exists in lines 70-75.

Apply this diff to remove the duplicate code:

-  if (frontmatter.is_imported_content === undefined) {
-    errors.push('Missing is_imported_content field (required for duplicate page handling)')
-  } else if (typeof frontmatter.is_imported_content !== 'string' || 
-    !['true', 'false'].includes(frontmatter.is_imported_content)) {
-    errors.push('is_imported_content must be a string: "true" or "false"')
-  }
♻️ Duplicate comments (2)
utils/metadata-manager.ts (2)

35-45: 🛠️ Refactor suggestion

Add minimum check for personas.

The personas array check ensures it's present and valid, but doesn't enforce the minimum count requirement from the config.

Apply this diff to implement the minimum check:

if (!frontmatter.personas || !Array.isArray(frontmatter.personas)) {
  errors.push('Missing or invalid personas array')
} else {
+  if (frontmatter.personas.length < 1) {
+    errors.push('At least one persona is required')
+  }
  const validPersonas = config.metadata_rules.persona.validation_rules[0].enum
  frontmatter.personas.forEach(persona => {
    if (!validPersonas.includes(persona)) {
      errors.push(`Invalid persona: ${persona}`)
    }
  })
}

58-67: 🛠️ Refactor suggestion

Add minimum and duplicate checks for categories.

Like with personas, the config sets minimum requirements and disallows duplicates for categories.

Apply this diff to enforce these constraints:

if (!frontmatter.categories || !Array.isArray(frontmatter.categories)) {
  errors.push('Missing or invalid categories array')
} else {
+  if (frontmatter.categories.length < 1) {
+    errors.push('At least one category is required')
+  }
+  
+  // Check for duplicates
+  const uniqueCategories = new Set(frontmatter.categories)
+  if (uniqueCategories.size !== frontmatter.categories.length) {
+    errors.push('Duplicate categories are not allowed')
+  }
  const validCategories = config.metadata_rules.categories.values
  frontmatter.categories.forEach(category => {
    if (!validCategories.includes(category)) {
      errors.push(`Invalid category: ${category}`)
    }
  })
}
🧹 Nitpick comments (1)
utils/metadata-manager.ts (1)

154-157: Create a helper function for suggestion generation.

The same suggestion structure is duplicated in multiple places. Apply DRY principles by creating a helper function.

Add this helper function to eliminate duplication:

function generateSuggestions(config) {
  return {
    validPersonas: config.metadata_rules.persona.validation_rules[0].enum,
    validContentTypes: config.metadata_rules.content_type.validation_rules[0].enum,
    validCategories: config.metadata_rules.categories.values
  }
}

Then update all occurrences to use this helper:

- suggestions: {
-   validPersonas: config.metadata_rules.persona.validation_rules[0].enum,
-   validContentTypes: config.metadata_rules.content_type.validation_rules[0].enum,
-   validCategories: config.metadata_rules.categories.values
- }
+ suggestions: generateSuggestions(config)

Also applies to: 171-174, 181-185

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45e92aa and 60aa2d4.

📒 Files selected for processing (6)
  • package.json (2 hunks)
  • utils/metadata-manager.ts (8 hunks)
  • utils/metadata-validator.ts (1 hunks)
  • package.json (1 hunks)
  • utils/metadata-manager.ts (7 hunks)
  • utils/metadata-validator.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • utils/metadata-validator.ts
  • package.json
  • package.json
  • utils/metadata-manager.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-optimism
  • GitHub Check: Header rules - docs-optimism
  • GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (13)
utils/metadata-validator.ts (6)

1-5: Good module imports and configuration setup.

The script properly imports necessary modules and includes the metadata validation utilities needed for the functionality.


6-16: Well-structured error handling and configuration setup.

The main function implementation has good error handling with a try-catch block, and properly loads configuration before proceeding. The approach to check for MDX files and exit early if none are found is efficient.


18-45: Robust file processing loop with proper error isolation.

The implementation processes each file independently and catches errors at the file level, which prevents one file error from stopping the entire validation process. This makes the script more resilient.


35-37: Add is_imported_content to the list of valid options.

The validation output should include guidance about the new required field.

Apply this diff to include the new required field in the suggestions output:

console.log('  - Personas:', result.suggestions?.validPersonas.join(', '))
console.log('  - Content Types:', result.suggestions?.validContentTypes.join(', '))
console.log('  - Categories:', result.suggestions?.validCategories.join(', '))
+console.log('  - Is Imported Content: "true", "false"')

47-56: Proper process exit handling based on validation results.

The implementation correctly exits with a non-zero status code when errors are found, following CLI best practices.


59-63: Good use of top-level await pattern for error handling.

Using the top-level await pattern with a catch block provides better error handling than directly using await in the script body.

utils/metadata-manager.ts (7)

23-29: Improved function signature with frontmatter extraction.

The updated validateMetadata function now accepts file content directly instead of a metadata object, which simplifies the validation flow by handling frontmatter extraction internally.


30-34: Good checks for required fields.

The validation properly checks for essential frontmatter fields like title, description, and topic.


69-75: Good validation for the new is_imported_content field.

The implementation properly validates the required is_imported_content field, checking both for its presence and correct format.


79-85: Well-structured validation result with suggestions.

The response includes helpful suggestions for valid values of personas, content types, and categories, which improves the user experience.


257-264: Good implementation of detailed validation suggestions.

The code now shows valid options for content types and personas, making it easier for users to correct issues in their frontmatter.


268-279: Clear example of required frontmatter structure.

The example YAML correctly includes all required fields including the new is_imported_content field, which serves as a good reference for users.


322-323: Clean integration with the new validation function.

The updated code properly uses the new validateMetadata function signature, passing the filepath and content.

@bradleycamacho bradleycamacho merged commit d231fac into main Mar 26, 2025
8 checks passed
@bradleycamacho bradleycamacho deleted the simplify-metadata-logic branch March 26, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants