-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is data loss possible when consume aggregated records by AWS lambda? #175
Comments
Hello - if you are consuming data created by the Kinesis Producer Library and published onto Kinesis Data Streams, then you will be fine. The data loss issue occurs if you use the Aggregation Library from outside of the KPL and then try to publish messages. We'll update the documentation to make this more clear. |
Hi @IanMeyers thanks for the reply, I am producing data by calling |
OK - so in this case if you are using stream autoscaling or scaling the stream manually, then it is possible that aggregated records will target Shards that don't exist. Therefore when you call |
Hi @IanMeyers, much appreciate! agg.aggregate([
{
data: '123',
partitionKey: 'a'
},
{
data: '456',
partitionKey: 'b'
}
], (d) => {
kinesis.putRecord({
Data: d.data,
PartitionKey: d.partitionKey,
StreamName: name
}).promise().then(console.log)
}) as my understand, the |
Correct. Let's say (for example) that your Stream only has 1 Shard, and you perform aggregation. The encoded Protobuf message will have an |
ok thanks, that resolves my question! |
I've been trying to get specific validation on this issue, but the Kinesis Lambda Event Source does use the KCL internally, and so may display this behaviour. |
@IanMeyers does this mean this tool is not safe to be used when consuming Kinesis Data streams from Lambda? I wanted to use this in combination with a QLDB stream but I'm not sure anymore if this is a good idea as it's crucial I get all records from the stream. More insights are highly appreciated, thanks :) |
Consumption is OK, when you are just using the |
Great, thanks for confirming! |
This part of the warning could use some clarification:
The suggestion described in this issue is to check for failed records, then re-aggregate and send. If all records in the aggregated record have the same partition key, then does that also mitigate the problem (if the EHK is never provided in the protobuf)? |
I've read the following warning from this page
Caution - this module is only suitable for low-value messages which are processed in aggregate. Do not use Kinesis Aggregation for data which is sensitive or where every message must be delivered, and where the KCL (including with AWS Lambda) is used for processing. DATA LOSS CAN OCCUR.
But as I know, records are not consumed by KCL in lambda and need to be deaggregated manually by invoking
deaggregate
, Is it still possible data loss in this case?The text was updated successfully, but these errors were encountered: