You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
first of all thank you for your time and for the valuable work you're doing with GROBID.
I was wondering if a benchmark comparison exists between multimodal LLMs and GROBID in its Conditional Random Fields (CRF) and Deep Neural Network (DNN) variants?
LLMs are advancing and becoming more powerful on multimodal inputs (GPT-4 with vision capabilities, or ChatGPT-4o). Thus it is possible to parse PDFs and have structured outputs by using LLMs.
However, before considering if it is worth going in that direction, I was wondering if there's any comparison between any multimodal LLM and GROBID. If such a benchmark doesn't exist, do you have any insights or opinions on how these approaches might compare?
Thanks in advance! Any information would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
Hello everyone,
first of all thank you for your time and for the valuable work you're doing with GROBID.
I was wondering if a benchmark comparison exists between multimodal LLMs and GROBID in its Conditional Random Fields (CRF) and Deep Neural Network (DNN) variants?
LLMs are advancing and becoming more powerful on multimodal inputs (GPT-4 with vision capabilities, or ChatGPT-4o). Thus it is possible to parse PDFs and have structured outputs by using LLMs.
However, before considering if it is worth going in that direction, I was wondering if there's any comparison between any multimodal LLM and GROBID. If such a benchmark doesn't exist, do you have any insights or opinions on how these approaches might compare?
Thanks in advance! Any information would be greatly appreciated!
The text was updated successfully, but these errors were encountered: