Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(idempotency): deep sort payload during hashing #2570

Merged

Conversation

arnabrahman
Copy link
Contributor

Summary

According to the Issue, payloads containing the exact same keys should be considered idempotent regardless of their order.

For example, these two requests should be idempotent.

const requestA = {
  name: "John",
  age: 30,
  city: "New York",
  address: {
    street: "5th Avenue",
    number: 123,
  },
};

const requestB = {
  city: "New York",
  name: "John",
  age: 30,
  address: {
    number: 123,
    street: "5th Avenue",
  },
};

The same goes for JSON arrays, the following two requests should be idempotent.

const requestA = [{
  a: 30,
  c: "John",
  b: "New York",
}]

const requestB = [{
  a: 30,
  b: "New York",
  c: "John"
}];

To achieve this, we need to implement a function that deep sorts the payload & use it to sort the payload before generating the hash.

Changes

Implement a function that recursively sorts the keys of an object or elements of an array. The function is based on this blog post that was mentioned in the issue description. In addition, added support for sorting objects inside JSON array.

  • Uses getType from @aws-lambda-powertools/commons to determine the input data type.
  • For objects, it sorts the keys alphabetically in a case-insensitive manner using sortObject.
  • For arrays, it recursively applies deepSort to each element.
  • Primitives are returned unchanged.

Used this function before generating hash inside BasePersistenceLayer class.

Also, written unit tests for the function for different scenarios that I can think of. I couldn't find a method in jest that checks if two objects are equal and ensures that the key positions are similar. Therefore, I used JSON.stringify to compare the two objects.

I am not sure if I need to update any documentation for this. Please let me know if I need to make any tweaks.

Issue number: #2120


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@boring-cyborg boring-cyborg bot added idempotency This item relates to the Idempotency Utility tests PRs that add or change tests labels May 22, 2024
@pull-request-size pull-request-size bot added the size/L PRs between 100-499 LOC label May 22, 2024
@arnabrahman arnabrahman marked this pull request as ready for review May 22, 2024 09:04
@arnabrahman arnabrahman requested review from a team as code owners May 22, 2024 09:04
@github-actions github-actions bot added the feature PRs that introduce new features or minor changes label May 22, 2024
Copy link
Contributor

@dreamorosi dreamorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests for this branch are failing because the sorting changes the hashes/idempotency ID generated for some of the tests.

I have also left a couple comments in the code.

packages/idempotency/src/deepSort.ts Outdated Show resolved Hide resolved
packages/idempotency/src/deepSort.ts Outdated Show resolved Hide resolved
@dreamorosi
Copy link
Contributor

Thank you for the PR @arnabrahman, appreciate it!

@dreamorosi
Copy link
Contributor

@am29d, @heitorlessa - This change deep sorts payloads before generating the hash in the Idempotency ID. This is a necessary change as it will lead to better idempotency detection.

There is a risk however that if customers update Powertools and deploy functions with the new version that includes this change, they might see repeated requests due to the payload now being sorted and thus different.

For example, this sequence could happen:

  1. Client makes request with payload { "id": 1, "foo": 2 }, hash is generated based on this payload and it's stored
  2. Customer upgrades Powertools to latest
  3. Client makes another request within idempotency ttl with same payload, which is now sorted. This means the payload used to generate the hash would now be { "foo": 2, "id": 1 }, which will generate a new hash and cause the request to be processed

It's important to notice that this will happen only if:

  • Customers have payloads that are not sorted
  • Customers happen to upgrade and roll out the change AND a unsorted request that has already been stored is still active in the persistence store AND a subsequent request comes within that idempotency ttl

Do you think this should be treated as a breaking change? I think there's an argument to be made that this new behavior should've been the default since the beginning and that we're covering a feature gap rather than adding a new feature, but I am not sure.

What do you think?

@am29d
Copy link
Contributor

am29d commented May 23, 2024

I agree that this is a necessary change and improves idempotency utility and this is a breaking change. As you said, this could disable idempotency for some cases which can lead to severe impact on the customer side.

@arnabrahman
Copy link
Contributor Author

@dreamorosi for the e2e tests, i couldn't run it locally but maybe we need to deepSort the record before stringify. https://github.com/aws-powertools/powertools-lambda-typescript/blob/main/packages/idempotency/tests/e2e/makeIdempotent.test.ts#L177

@dreamorosi dreamorosi linked an issue May 23, 2024 that may be closed by this pull request
2 tasks
@dreamorosi
Copy link
Contributor

@arnabrahman that's right, I believe that if we sort these payloads the tests should pass.

Please bear with us while we agree on a way forward - I'm unsure whether this should be considered a bug fix or a new feature, in which case it might be a breaking change.

Also what do you think?

@arnabrahman
Copy link
Contributor Author

@dreamorosi , this is a peculiar situation. At a high level, this sounds like a bug fix.

However, from my experience, I don't recall ever sorting a payload of objects before sending it. I might have done so unintentionally, but never intentionally. I believe that in the majority of cases, this change will cause issues. So, from a customer's perspective, this is a Breaking Change.

If so, in my opinion, we cannot say it's a bug fix.

@dreamorosi
Copy link
Contributor

I see where you're coming from.

Imo the fact that we were not ordering the payload, especially since Powertools for AWS Lambda (Python) does it, was not intended and even though the change could cause unintended side effects, these are only the results of us not sorting the payload in the first place.

Like you said, sorting is not something customers would (nor should) think about it, but requests with the same payload content should be idempotent, so the fact that this wasn't the case maybe makes it a bug.

@dreamorosi dreamorosi added the on-hold This item is on-hold and will be revisited in the future label May 27, 2024
@dreamorosi
Copy link
Contributor

Putting this on hold until @heitorlessa is back to advise on how to move forward.

@heitorlessa
Copy link
Contributor

heitorlessa commented Jun 3, 2024

Take it as a bug fix, as this shouldn't have passed in our tests suite. There are internet proxies and the fact a client or a proxy could interfere with the guaranteed idempotency within a time window makes this a serious bug. e.g., HTTP headers ordering in API Gateway

example: If we were changing the idempotency key format, I'd consider as a breaking change.

Here, it'll cause bigger harm for new customers having mixed results without any way to debug it. In the worst case, we know > 90% of customers don't change the defaults, so idempotency records will expire within an hour.. reducing the impact window for the bugfix.

Copy link

sonarcloud bot commented Jun 3, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@dreamorosi dreamorosi changed the title feat(idempotency): deep sort payload during hashing fix(idempotency): deep sort payload during hashing Jun 3, 2024
Copy link
Contributor

@dreamorosi dreamorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this important PR and for the patience during the review!

@dreamorosi dreamorosi merged commit 6765f35 into aws-powertools:main Jun 3, 2024
12 checks passed
@github-actions github-actions bot added the bug Something isn't working label Jun 3, 2024
@arnabrahman arnabrahman deleted the 2120-sort-payload-during-hashing branch June 3, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature PRs that introduce new features or minor changes idempotency This item relates to the Idempotency Utility on-hold This item is on-hold and will be revisited in the future size/L PRs between 100-499 LOC tests PRs that add or change tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: deep sort payload hashing
4 participants