Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SPSS and STATA files #652

Open
raprasad opened this issue Jun 30, 2022 · 4 comments · May be fixed by #675
Open

Support SPSS and STATA files #652

raprasad opened this issue Jun 30, 2022 · 4 comments · May be fixed by #675

Comments

@raprasad
Copy link
Member

No description provided.

@raprasad raprasad added this to the Stat Enhancements milestone Jun 30, 2022
@ecowan ecowan self-assigned this Jul 29, 2022
@ecowan
Copy link
Contributor

ecowan commented Jul 29, 2022

@raprasad Starting on this tonight - prototyping a class that will handle SPSS (.sav) and Stata (.dta) files in a single interface and return a Pandas dataframe (based on https://github.com/Roche/pyreadstat)

@ecowan ecowan linked a pull request Jul 29, 2022 that will close this issue
@ecowan
Copy link
Contributor

ecowan commented Aug 2, 2022

@raprasad After researching the differences between using pandas and pyreadstat, I'm finding that pandas supports a much smaller subset of file version numbers. The two files I am using for testing (one .dta and one .sav) both pass tests using pyreadstat, but pandas gives the following error:

Version of given Stata file is 36. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables)

Given this restriction, it seems preferable to keep pyreadstat as the SPSS / Stata reading library.

@raprasad raprasad removed this from the Stat Enhancements milestone Aug 3, 2022
@ecowan
Copy link
Contributor

ecowan commented Aug 4, 2022

@raprasad Currently getting the following error as the variable info page loads:

Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte)

From the logs:

celery-queue_1 | [tasks.py:62 - profile_dataset_info() ] profile_dataset_info: Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte) celery-queue_1 | [2022-08-04 16:08:08,215: ERROR/ForkPoolWorker-6] profile_dataset_info: Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte)

raprasad added a commit that referenced this issue Aug 10, 2022
@raprasad raprasad added dev: server side and removed Priority 2 ⛅ Priority (1 is highest) labels Nov 30, 2022
@raprasad
Copy link
Member Author

raprasad commented Jan 5, 2023

This needs a redo as per PR notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants