Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deleted] Explore formatting data with SQLite rather than Python directly #47

Closed
2 tasks done
andrewtavis opened this issue Sep 18, 2023 · 2 comments
Closed
2 tasks done
Labels
feature New feature or request help wanted Extra attention is needed refactor Refactor code to improve quality wontfix This will not be worked on

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Sep 18, 2023

Terms

Description

One of the major issues with Scribe-Data at time of writing is that we have the formatting for all the language data within relatively large/complex format_WORD_TYPE.py scripts. A general thought within the team is that this could be simplified by converting these processes over to use SQLite via sqlite3. Rather than loading in JSON files and formatting them using conditionals in a dictionary structure, the raw JSONs could be loaded as a table with the final output being a conditional selection from this table.

This issue could just be the creation of a proof of concept that this cane work, and from there we expand to converting the formatting processes over 🚀

There's also the potential to do this with SPARQL on the Wikidata end, but we already are needing to break up the files because the rate limits are hit, which would only get worse with more complex selections. I'd say that this would be the ideal way of doing this :)

Contribution

Happy to work on this myself or support someone who'd like to contribute! 😊

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed labels Sep 18, 2023
@andrewtavis andrewtavis moved this from Todo to In Progress in Scribe Board Sep 18, 2023
@andrewtavis
Copy link
Member Author

CC @lillian-mo and @wkyoshida for the discussion here :)

@andrewtavis andrewtavis moved this from In Progress to Todo in Scribe Board Sep 18, 2023
@andrewtavis andrewtavis added the refactor Refactor code to improve quality label Sep 18, 2023
@andrewtavis
Copy link
Member Author

Closing this issue as the goal is now that #59 would cover this along with the decision that our data exports should directly match Wikidata data structures. Rather than Scribe creating combined data based on strings, we'll instead stick to the given lexeme based entries and change how the iOS app and others are referencing the provided data packs. The work for this can thus be handled in #59 😊

@github-project-automation github-project-automation bot moved this from Todo to Done in Scribe Board Feb 27, 2024
@andrewtavis andrewtavis changed the title Explore formatting data with SQLite rather than Python directly [Deleted] Explore formatting data with SQLite rather than Python directly Feb 27, 2024
@andrewtavis andrewtavis added the wontfix This will not be worked on label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request help wanted Extra attention is needed refactor Refactor code to improve quality wontfix This will not be worked on
Projects
Archived in project
Development

No branches or pull requests

1 participant