Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp compatibility issue between pyarrow pandas and parquet #294

Open
vivek-biradar opened this issue Mar 13, 2022 · 10 comments
Open

Comments

@vivek-biradar
Copy link

Apache Arrow processing error: Casting from timestamp[ns] to timestamp[us] would lose data: -6795537378871345152

@matteofigus
Copy link
Member

matteofigus commented Mar 14, 2022

Hi, thanks for submitting an issue. Which version of the solution are you using? If in doubt, you should be able to find that in the CloudFormation stack output.

@vivek-biradar
Copy link
Author

@matteofigus : Thanks for the quick reply.. Its v0.44

image

@vivek-biradar
Copy link
Author

vivek-biradar commented Mar 14, 2022

@matteofigus : Apologies for the follow-up. Can this be fixed on priority or it is something that will take time? Actually , this bug is blocking us from using this solution in our production environments. Just a timeline on when it can be fixed would be great

@vivek-biradar
Copy link
Author

@matteofigus : Update: We tried to fix the code and deploy manually on our environments. Encountered below using during DeployStack

image

CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [d027789d-1529-48d8-82a9-4d865bbe17f0]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

@matteofigus
Copy link
Member

Hi @vivek-biradar,

if I correctly understand, you made a change and then tried to redeploy the stack using a new bucket to host the BuildArtefact. The lambda is trying to copy that from your bucket to the bucket that will rebuild the backend.
In order for that to succeed, you need this policy to use the correct bucket, but the one you used to repackage the backend. If you look at the CloudWatch logs for the custom resource lambda, my guess is that you may see it failed because of a permission issue.

If that's the case, you need to make sure you deploy overriding the artefacts bucket in the Cloudformation template's PreBuiltArtefactsBucketOverride to use your bucket as explained here: https://github.com/awslabs/amazon-s3-find-and-forget/blob/master/docs/USER_GUIDE.md#deploying-the-solution
This should allow the custom resource to copy successfully from that bucket.

In any case, can I know how did you implement a fix? I started an investigation and I am currently looking into corce_timestamps and allow_truncated_timestamps options here https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html

@vivek-biradar
Copy link
Author

@matteofigus : We did override the PreBuiltArtefactsBucketOverride with our own s3 bucket and tried to deploy. Will have a look at the policy.

On the fix, actually its a crude one and just tested locally.

image

image

@matteofigus
Copy link
Member

matteofigus commented Mar 15, 2022

Cool, if you can look at the Cloudwatch logs that would be best as we can see the exact error there.

If you get stuck and you are only trying to redeploy the backend for testing, you can also look into this script we have in the Makefile that you could run via make redeploy-containers (but in case check the stack name): https://github.com/awslabs/amazon-s3-find-and-forget/blob/master/Makefile#L107-L114 basically rebuild the container and just upload it to your docker registry for a very quick test (but I probably don't recommend using it for production).

@maxcikoski
Copy link
Contributor

Hello @matteofigus
The error we are getting is inside lambda execution, but not related to permissions. Actually, it's not able to install a python library (which is present at layer definition):

2022-03-14T18:23:04.221-05:00 START RequestId: 42b4da30-6b02-4f32-8952-a9f6d2f2b122 Version: $LATEST

2022-03-14T18:23:04.222-05:00 [ERROR] Runtime.ImportModuleError: Unable to import module 'copy_build_artefact': No module named 'crhelper' Traceback (most recent call last):

2022-03-14T18:23:04.224-05:00 END RequestId: 42b4da30-6b02-4f32-8952-a9f6d2f2b122

2022-03-14T18:23:04.225-05:00 REPORT RequestId: 42b4da30-6b02-4f32-8952-a9f6d2f2b122 Duration: 1.79 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 55 MB Init Duration: 305.63 ms

FYI @vivek-biradar

@matteofigus
Copy link
Member

matteofigus commented Mar 15, 2022

I see, this may be a packaging issue then. Did you run make setup on the project's root as recommended here? https://github.com/awslabs/amazon-s3-find-and-forget/blob/master/docs/LOCAL_DEVELOPMENT.md

@maxcikoski
Copy link
Contributor

Make setup failed at first, so we tried directly make deploy.
Now we ensured the make setup execution complete successfully and retrying the make deploy.
Thanks for the heads up, @matteofigus .

FYI @vivek-biradar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants