Support SPSS and STATA files #652

raprasad · 2022-06-30T12:55:39Z

No description provided.

ecowan · 2022-07-29T02:38:11Z

@raprasad Starting on this tonight - prototyping a class that will handle SPSS (.sav) and Stata (.dta) files in a single interface and return a Pandas dataframe (based on https://github.com/Roche/pyreadstat)

ecowan · 2022-08-02T05:55:14Z

@raprasad After researching the differences between using pandas and pyreadstat, I'm finding that pandas supports a much smaller subset of file version numbers. The two files I am using for testing (one .dta and one .sav) both pass tests using pyreadstat, but pandas gives the following error:

Version of given Stata file is 36. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables)

Given this restriction, it seems preferable to keep pyreadstat as the SPSS / Stata reading library.

ecowan · 2022-08-04T16:09:04Z

@raprasad Currently getting the following error as the variable info page loads:

Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte)

From the logs:

celery-queue_1 | [tasks.py:62 - profile_dataset_info() ] profile_dataset_info: Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte) celery-queue_1 | [2022-08-04 16:08:08,215: ERROR/ForkPoolWorker-6] profile_dataset_info: Failed to open file due to UnicodeDecodeError. ('utf-8' codec can't decode byte 0x96 in position 6: invalid start byte)

raprasad · 2023-01-05T15:45:07Z

This needs a redo as per PR notes

raprasad added Priority 2 ⛅ Priority (1 is highest) DP Creator Stats-enhancement labels Jun 30, 2022

raprasad added this to the Stat Enhancements milestone Jun 30, 2022

ecowan self-assigned this Jul 29, 2022

ecowan linked a pull request Jul 29, 2022 that will close this issue

Working on data reader for SPSS and Stata files #675

Open

raprasad removed this from the Stat Enhancements milestone Aug 3, 2022

raprasad added a commit that referenced this issue Aug 10, 2022

move libmagic1 above more frequent operations #652

6a7e10c

raprasad added a commit that referenced this issue Aug 10, 2022

minor #652

25a6868

raprasad added dev: server side and removed Priority 2 ⛅ Priority (1 is highest) labels Nov 30, 2022

raprasad unassigned ecowan Jan 5, 2023

raprasad moved this to Ready for Development in OpenDP Library Development Mar 21, 2023

raprasad added this to OpenDP Library Development Mar 21, 2023

raprasad added this to DP Creator Development Apr 3, 2023

raprasad removed this from OpenDP Library Development Apr 4, 2023

raprasad removed the DP Creator label May 10, 2023

raprasad modified the milestones: 2023-Q3: UI improvements, UI Improvements (undated) May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SPSS and STATA files #652

Support SPSS and STATA files #652

raprasad commented Jun 30, 2022

ecowan commented Jul 29, 2022 •

edited

Loading

ecowan commented Aug 2, 2022

ecowan commented Aug 4, 2022 •

edited

Loading

raprasad commented Jan 5, 2023

Support SPSS and STATA files #652

Support SPSS and STATA files #652

Comments

raprasad commented Jun 30, 2022

ecowan commented Jul 29, 2022 • edited Loading

ecowan commented Aug 2, 2022

ecowan commented Aug 4, 2022 • edited Loading

raprasad commented Jan 5, 2023

ecowan commented Jul 29, 2022 •

edited

Loading

ecowan commented Aug 4, 2022 •

edited

Loading