Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] No module named 'pyarrow._orc' in lambda python3.12 #3084

Open
debimishra89 opened this issue Jan 28, 2025 · 2 comments
Open

[ERROR] No module named 'pyarrow._orc' in lambda python3.12 #3084

debimishra89 opened this issue Jan 28, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@debimishra89
Copy link

Describe the bug

I am using lambda python 3.12
awssdkpandas layer for python 3.12, version 16
I am trying to read an ORC file and create a dataframe. So i am using s3.read_orc API of wrangler package.
please advise.

How to Reproduce

1- create lambda function with python 3.12
2- add lambda layer by selecting AWSSDKPandas-3.12 ,version 16 from AWS layers.
3-import awswrangler
4- try to read orc file using s3.read_orc API.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

linux

Python version

3.12

AWS SDK for pandas version

AWSSDKPandas 3.12, version 16

Additional context

No response

@debimishra89 debimishra89 added the bug Something isn't working label Jan 28, 2025
@debimishra89 debimishra89 changed the title [ERROR] No module named 'pyarrow._orc' in lambda [ERROR] No module named 'pyarrow._orc' in lambda python3.12 Jan 28, 2025
@jaidisido
Copy link
Contributor

ORC is not currently supported in our managed Lambda layers:
https://github.com/aws/aws-sdk-pandas/blob/main/building/lambda/build-lambda-layer.sh#L39

The reason some of the parameters in the Arrow build are turned off is to keep the size of the layer manageable and below the Lambda limit (250Mb unzipped, 50Mb zipped). ORC is one of them as it's less used compared to other formats (e.g. parquet, csv...).

@debimishra89
Copy link
Author

Now I tried to read parquet files. it gives me below error

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants