-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds general S3 use-case documentation. #3833 #3842
Adds general S3 use-case documentation. #3833 #3842
Conversation
Signed-off-by: David Venable <dlv@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions... thanks!
|
||
## Architecture | ||
|
||
Data Prepper supports reading objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same formulation as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to "can read objects".
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
||
<img src="{{site.url}}{{site.baseurl}}/images/data-prepper/s3-source/s3-pipeline.jpg" alt="S3 source architecture">{: .img-fluid} | ||
|
||
## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the H2 headings have two spaces above, others a single space. Settle on one or the other.
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
* Adds general S3 use-case documentation. #3833 Signed-off-by: David Venable <dlv@amazon.com> * Add edits to S3 logs doc * Apply suggestions from code review Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> (cherry picked from commit 196c750) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Adds general S3 use-case documentation. #3833 Signed-off-by: David Venable <dlv@amazon.com> * Add edits to S3 logs doc * Apply suggestions from code review Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> (cherry picked from commit 196c750) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Adds general S3 use-case documentation. #3833 * Add edits to S3 logs doc * Apply suggestions from code review * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review --------- (cherry picked from commit 196c750) Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
* Adds general S3 use-case documentation. #3833 * Add edits to S3 logs doc * Apply suggestions from code review * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review --------- (cherry picked from commit 196c750) Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS Please see my comments/changes and let me know if you have any questions. Thanks!
|
||
## Architecture | ||
|
||
Data Prepper can read objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "(SQS)".
|
||
<img src="{{site.url}}{{site.baseurl}}/images/data-prepper/s3-source/s3-architecture.jpg" alt="S3 source architecture">{: .img-fluid} | ||
|
||
The flow of data is as follows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the period with a colon.
|
||
The flow of data is as follows. | ||
|
||
1. A system produces logs into the S3 bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure one can "produce into". Do we mean "A system sends logs to an S3 bucket"?
The flow of data is as follows. | ||
|
||
1. A system produces logs into the S3 bucket. | ||
2. S3 creates an S3 event notification in the SQS queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace "the" with "an".
2. S3 creates an S3 event notification in the SQS queue. | ||
3. Data Prepper polls Amazon SQS for messages and then receives a message. | ||
4. Data Prepper downloads the content from the S3 object. | ||
5. Data Prepper sends a document to OpenSearch for the content in the S3 object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit unclear. Instead of "for", do we mean something like "containing"?
|
||
## Pipeline design | ||
|
||
Create a pipeline to read logs from S3, starting with an [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) source plugin. Use the following example for guidance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Create a pipeline for reading logs from S3"? Replace the terminal period with a colon.
|
||
Configure the following options according to your use case: | ||
|
||
* `queue_url`: This the SQS queue URL and is always unique to your pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is"
* `codec`: The codec determines how to parse the incoming data. | ||
* `visibility_timeout`: Configure this value to be large enough for Data Prepper to process 10 S3 objects. However, if you make this value too large, messages that fail to process will take at least as long as the specified value before Data Prepper retries. | ||
|
||
The default values for each option work for the majority of use cases. For all available options for the S3 source, see [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For all available S3 source options"
|
||
The default values for each option work for the majority of use cases. For all available options for the S3 source, see [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be introduced by a brief sentence ending in a colon.
|
||
## Multiple Data Prepper pipelines | ||
|
||
We recommend that you have one SQS queue per Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional configuration with Data Prepper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We recommend that you create one SQS queue per Data Prepper pipeline. Additionally, multiple nodes in the same cluster can read from the same SQS queue, which doesn't require additional Data Prepper configuration."
* Adds general S3 use-case documentation. #3833 Signed-off-by: David Venable <dlv@amazon.com> * Add edits to S3 logs doc * Apply suggestions from code review Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
…search-project#3842) * Adds general S3 use-case documentation. opensearch-project#3833 Signed-off-by: David Venable <dlv@amazon.com> * Add edits to S3 logs doc * Apply suggestions from code review Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update s3-logs.md * Update s3-logs.md * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Description
This adds a new use-case to Data Prepper for S3 logs.
Issues Resolved
Contributes toward #3833
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.