Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data availability tokens misclassified #1197

Open
lfoppiano opened this issue Nov 7, 2024 · 0 comments
Open

Data availability tokens misclassified #1197

lfoppiano opened this issue Nov 7, 2024 · 0 comments
Labels
error cases Some error/test case for future improvements models:header

Comments

@lfoppiano
Copy link
Collaborator

This issue happens with the Delft Models where the final part of the availability statement is misclassified as <abstract>. With the CRF model the availability statement is truncated at the end of the page.
So, in principle, having this document as training data will benefit both architectures.

image

PDF (CC-BY): 11_10.1371_journal.pone.0215651.pdf

sufficient	sufficient	s	su	suf	suff	t	nt	ent	ient	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<availability>
for	for	f	fo	for	for	r	or	for	for	BLOCKIN	LINEEND	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<availability>
calibration	calibration	c	ca	cal	cali	n	on	ion	tion	BLOCKIN	LINESTART	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<availability>
and	and	a	an	and	and	d	nd	and	and	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<availability>
validation	validation	v	va	val	vali	n	on	ion	tion	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<availability>
of	of	o	of	of	of	f	of	of	of	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	1	0	0	NOPUNCT	0	0	1	0	<abstract>
the	the	t	th	the	the	e	he	the	the	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<abstract>
model	model	m	mo	mod	mode	l	el	del	odel	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	1	0	0	NOPUNCT	0	0	1	0	<abstract>
.	.	.	.	.	.	.	.	.	.	BLOCKEND	LINEEND	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	ALLCAP	NODIGIT	1	0	0	0	0	0	0	0	DOT	0	0	1	0	<abstract>
Funding	funding	F	Fu	Fun	Fund	g	ng	ing	ding	BLOCKSTART	LINESTART	ALIGNEDLEFT	NEWFONT	SAMEFONTSIZE	1	0	INITCAP	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<other>
:	:	:	:	:	:	:	:	:	:	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	1	0	ALLCAP	NODIGIT	1	0	0	0	0	0	0	0	PUNCT	0	0	1	0	<other>
The	the	T	Th	The	The	e	he	The	The	BLOCKIN	LINEIN	ALIGNEDLEFT	NEWFONT	SAMEFONTSIZE	0	0	INITCAP	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	I-<funding>
authors	authors	a	au	aut	auth	s	rs	ors	hors	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<funding>
received	received	r	re	rec	rece	d	ed	ved	ived	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<funding>
no	no	n	no	no	no	o	no	no	no	BLOCKIN	LINEIN	ALIGNEDLEFT	SAMEFONT	SAMEFONTSIZE	0	0	NOCAPS	NODIGIT	0	0	1	0	0	0	0	0	NOPUNCT	0	0	1	0	<funding>
@lfoppiano lfoppiano added error cases Some error/test case for future improvements models:header labels Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error cases Some error/test case for future improvements models:header
Projects
None yet
Development

No branches or pull requests

1 participant