Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds general S3 use-case documentation. #3833 #3842

Merged
merged 6 commits into from
Apr 27, 2023

Conversation

dlvenable
Copy link
Member

Description

This adds a new use-case to Data Prepper for S3 logs.

Issues Resolved

Contributes toward #3833

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sorry, something went wrong.

Signed-off-by: David Venable <dlv@amazon.com>
Copy link
Contributor

@hdhalter hdhalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions... thanks!


## Architecture

Data Prepper supports reading objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same formulation as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to "can read objects".

Naarcha-AWS and others added 3 commits April 27, 2023 15:50
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.


<img src="{{site.url}}{{site.baseurl}}/images/data-prepper/s3-source/s3-pipeline.jpg" alt="S3 source architecture">{: .img-fluid}

## Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the H2 headings have two spaces above, others a single space. Settle on one or the other.

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
@Naarcha-AWS Naarcha-AWS merged commit 196c750 into opensearch-project:main Apr 27, 2023
@Naarcha-AWS Naarcha-AWS added backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6 labels Apr 27, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 27, 2023
* Adds general S3 use-case documentation. #3833

Signed-off-by: David Venable <dlv@amazon.com>

* Add edits to S3 logs doc

* Apply suggestions from code review

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
(cherry picked from commit 196c750)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 27, 2023
* Adds general S3 use-case documentation. #3833

Signed-off-by: David Venable <dlv@amazon.com>

* Add edits to S3 logs doc

* Apply suggestions from code review

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
(cherry picked from commit 196c750)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Naarcha-AWS added a commit that referenced this pull request Apr 27, 2023
* Adds general S3 use-case documentation. #3833



* Add edits to S3 logs doc

* Apply suggestions from code review




* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review




---------






(cherry picked from commit 196c750)

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Naarcha-AWS added a commit that referenced this pull request Apr 27, 2023
* Adds general S3 use-case documentation. #3833



* Add edits to S3 logs doc

* Apply suggestions from code review




* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review




---------






(cherry picked from commit 196c750)

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS Please see my comments/changes and let me know if you have any questions. Thanks!


## Architecture

Data Prepper can read objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "(SQS)".


<img src="{{site.url}}{{site.baseurl}}/images/data-prepper/s3-source/s3-architecture.jpg" alt="S3 source architecture">{: .img-fluid}

The flow of data is as follows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the period with a colon.


The flow of data is as follows.

1. A system produces logs into the S3 bucket.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure one can "produce into". Do we mean "A system sends logs to an S3 bucket"?

The flow of data is as follows.

1. A system produces logs into the S3 bucket.
2. S3 creates an S3 event notification in the SQS queue.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "the" with "an".

2. S3 creates an S3 event notification in the SQS queue.
3. Data Prepper polls Amazon SQS for messages and then receives a message.
4. Data Prepper downloads the content from the S3 object.
5. Data Prepper sends a document to OpenSearch for the content in the S3 object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit unclear. Instead of "for", do we mean something like "containing"?


## Pipeline design

Create a pipeline to read logs from S3, starting with an [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) source plugin. Use the following example for guidance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Create a pipeline for reading logs from S3"? Replace the terminal period with a colon.


Configure the following options according to your use case:

* `queue_url`: This the SQS queue URL and is always unique to your pipeline.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is"

* `codec`: The codec determines how to parse the incoming data.
* `visibility_timeout`: Configure this value to be large enough for Data Prepper to process 10 S3 objects. However, if you make this value too large, messages that fail to process will take at least as long as the specified value before Data Prepper retries.

The default values for each option work for the majority of use cases. For all available options for the S3 source, see [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For all available S3 source options"


The default values for each option work for the majority of use cases. For all available options for the S3 source, see [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/).

```yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be introduced by a brief sentence ending in a colon.


## Multiple Data Prepper pipelines

We recommend that you have one SQS queue per Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional configuration with Data Prepper.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We recommend that you create one SQS queue per Data Prepper pipeline. Additionally, multiple nodes in the same cluster can read from the same SQS queue, which doesn't require additional Data Prepper configuration."

vagimeli pushed a commit that referenced this pull request May 4, 2023
* Adds general S3 use-case documentation. #3833

Signed-off-by: David Venable <dlv@amazon.com>

* Add edits to S3 logs doc

* Apply suggestions from code review

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
vagimeli added a commit that referenced this pull request May 4, 2023
@dlvenable dlvenable deleted the 3833-s3-use-case branch August 21, 2023 16:49
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
…search-project#3842)

* Adds general S3 use-case documentation. opensearch-project#3833

Signed-off-by: David Venable <dlv@amazon.com>

* Add edits to S3 logs doc

* Apply suggestions from code review

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update s3-logs.md

* Update s3-logs.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants