Skip to content

fix(kafka): Handle message indices in proto data also for Glue Schema Registry #1906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

karthikpswamy
Copy link

@karthikpswamy karthikpswamy commented Jun 19, 2025

Issue #, if available:

Description of changes:

  • Adding support for different schema registry types (Confluent and Glue)
  • Improving the handling of message indices in protobuf data
  • Fixing deserialization logic for different schema registry formats
  • Adding appropriate tests for both Confluent and Glue schema registry scenarios

Schema Registry Type Detection and Deserialization Logic:

No Schema Registry Integration:

  • When KafkaEvent contains no key/value metadata
  • Uses entire byte array as raw data for deserialization
  • AWS Glue Schema Registry:

Identified by 16-byte schema ID in key/value metadata

  • Skips first byte of data
  • Uses remaining bytes for Protobuf deserialization
  • Confluent Schema Registry:

Default case when neither of above conditions are met

  • Follows Confluent wire format specification
  • Dynamically skips 1 or 2 bytes based on message index
  • Uses remaining bytes for Protobuf deserialization

Checklist

Breaking change checklist

RFC issue #:

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

@phipag phipag changed the title Bug fix to handle message indices in proto data fix(kafka): Handle message indices in proto data also for Glue Schema Registry Jun 20, 2025
@phipag
Copy link
Contributor

phipag commented Jun 20, 2025

Hey @karthikpswamy. Thanks again for contributing this logic. To unblock work during my timezone I created a copy of your PR here: #1907. Please review it. I did some minor changes but the key change is that I updated the GLUE Schema ID size to 36 because this is length of the uuids. Can you let me know if this is correct?

@phipag phipag closed this Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants