Replies: 3 comments 1 reply
-
建议3:normalize的mean/std都改成0.5 (我猜测训练时用的mean/std是0.5) 验证方法: 安装: pip install rapidocr_onnxruntime
pip install datasets
pip install text_det_metric 测试(比较更改normalize值前后的hmean差距): from pathlib import Path
import cv2
import numpy as np
from datasets import load_dataset
from rapidocr_onnxruntime import RapidOCR
from tqdm import tqdm
engine = RapidOCR()
dataset = load_dataset("SWHL/text_det_test_dataset")
test_data = dataset["test"]
content = []
for i, one_data in enumerate(tqdm(test_data)):
img = np.array(one_data.get("image"))
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
dt_boxes, elapse = engine(img, use_det=True, use_cls=False, use_rec=False)
dt_boxes = [] if dt_boxes is None else dt_boxes
elapse = 0 if elapse is None else elapse[0]
gt_boxes = [v["points"] for v in one_data["shapes"]]
content.append(f"{dt_boxes}\t{gt_boxes}\t{elapse}")
with open("pred.txt", "w", encoding="utf-8") as f:
for v in content:
f.write(f"{v}\n")
from text_det_metric import TextDetMetric
metric = TextDetMetric()
pred_path = "pred.txt"
metric = metric(pred_path)
print(metric) 汇总结果:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
该部分建议是和文档图像方向分类相关的,已经移步到RapidOrientation issue #29下处理了 |
Beta Was this translation helpful? Give feedback.
0 replies
-
新建的版面分析模型doclayout 输入shape是固定的1024x1024, 这样会有两个问题: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
图片方向检测不准,建议做一些改进
现在的方法是先将原图片长边缩放到256, 再剪切中间的224x224。 对于文字少的页面, 经常剪的区域不对。另外,normalize的mean/std使用的是imagenet的,这个值效果不好。
我改进的方法是:
1、将原图长边缩到256,再填充边框,使得图片大小为256x256
2、再将图片resize到224x224 (将第一步的长边改为224,省掉第2步应该也是一样的)
3、normalize的mean/std都改成0.5 (我猜测训练时用的mean/std是0.5)
改进后目前出错概率很低
rapid ai系列用的模型,有的图片用BGR效果好, 有的用RGB效果好, 归一化时有的要用imagenet的均值/方差,有的要用0.5均值/方差, 好象还有个别的不需要归一化的, 太晕了
Beta Was this translation helpful? Give feedback.
All reactions