Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format PDF to detect references #1152

Open
HuynhVuInnomize opened this issue Aug 5, 2024 · 1 comment
Open

Format PDF to detect references #1152

HuynhVuInnomize opened this issue Aug 5, 2024 · 1 comment

Comments

@HuynhVuInnomize
Copy link

HuynhVuInnomize commented Aug 5, 2024

Can you explain how to edit the PDF file format and what the correct format should be to detect references?

In attached image. Grobib just detect ref at first case (word "and" the same line with "Vlissides (1995)") and can not detect the 2nd and third case.
Thank you.
!
test_format

@lfoppiano
Copy link
Collaborator

@HuynhVuInnomize it depends on the paper, layout and other statistical factors. The model responsible for this extraction is the fulltext which has around 30 examples. Adding a few more training data with problematic cases, should help rapidly. If you can share the examples we are planning to work on the training data in future, we can include it, however, if you are in a hurry you can create and correct them on your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants