Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data availabilty extraction failure use cases #1187

Open
lfoppiano opened this issue Oct 23, 2024 · 2 comments
Open

Data availabilty extraction failure use cases #1187

lfoppiano opened this issue Oct 23, 2024 · 2 comments
Assignees
Labels
error cases Some error/test case for future improvements models:segmentation

Comments

@lfoppiano
Copy link
Collaborator

lfoppiano commented Oct 23, 2024

In this case the noisy footers are wrongly captured in the DAs.

<div type="availability">
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <head>DATA AVAILABILITY</head>
                    <p>The genome assembly was uploaded to NCBI Genbank under the project number PRJNA1075679, with the genome accession number JBAGRT000000000. Research Article Microbiology Spectrum October 2024 Volume 12 Issue 10 10.1128/spectrum.00751-24 9 Downloaded from 
                        <ref type="url" target="https://journals.asm.org/journal/spectrum">https://journals.asm.org/journal/spectrum</ref> on 10 October 2024 by 2a01:e0a:d12:4600:6f65:dae8:67:55e1.
                    </p>
                </div>
            </div>

PDF (CC-BY): 3_10.1128_spectrum.00751-24.pdf

@lfoppiano lfoppiano added the error cases Some error/test case for future improvements label Oct 23, 2024
@lfoppiano
Copy link
Collaborator Author

Another example, the DAs is truncated by the page change

image
        <div type="availability">
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <head>DATA AVAILABILITY</head>
                    <p>Data was provided and stored by MOH and CIHI as above. The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet</p>
                </div>
            </div>

PDF (CC-BY): 4_10.1128_spectrum.02630-23.pdf

@lfoppiano
Copy link
Collaborator Author

Here two more cases from nature:

@lfoppiano lfoppiano self-assigned this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error cases Some error/test case for future improvements models:segmentation
Projects
None yet
Development

No branches or pull requests

1 participant