Skip to content

Consolidate reference corpora data and generate combined poem metadata file (redux) #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

rlskoeser
Copy link
Collaborator

ref: #204

Consolidates scattered logic for working with and compiling reference corpus metadata and text, using the existing utility methods from the build_text_corpus script and adapting (but not yet replacing) logic from refmatcha.

Also includes a preliminary compile_dataset script - in this branch it only handles the poem metadata.

rlskoeser and others added 29 commits April 16, 2025 23:01
Co-authored-by: Laure Thompson <[email protected]>
* Add PPA work-level methods & unit tests

* Added method to extract page meta from page_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants