Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

系统能够识别并将PDF文件中的表格转换为可读的Markdown格式 #490

Open
1 task done
hexixiang opened this issue Apr 24, 2024 · 3 comments
Open
1 task done

Comments

@hexixiang
Copy link

Issues

  • I have browsed through the Issues. 我已浏览过Issues,确定没有重复的建议。

Expected behavior 预期的功能

通过增强解析功能,使系统能够识别并将PDF文件中的表格转换为可读的Markdown格式,从而提高文件的可读性和可编辑性。

Approximate reference (optional) 近似的参考(可选)

No response

@hiroi-sora
Copy link
Owner

  • 中期计划:我们考虑引入 版面分析 的AI模型,来处理混合排版的复杂文件,更准确地提取表格区域。
  • 远期计划:我们考虑引入端到端大模型(如 【1】【2】 ),支持将文档/图片 整张转换为Markdown文本流。

@lison666
Copy link

能否顺带提供pdf转html的功能吗

@hiroi-sora
Copy link
Owner

能否顺带提供pdf转html的功能吗

这是更困难、更遥远的事情了。走一步看一步,等我们有了底层的识别模块,再考虑上层的输出模块。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants