Optimize the 'unclip' logic in DBPostProcess #181

HiDolen · 2024-05-19T13:03:27Z

当前问题

在 detect_process.py 的 DBPostProcess 类中，self.boxes_from_bitmap() 方法用到了 self.unclip()，以扩展所得 box 的边界。

···
box = self.unclip(points).reshape(-1, 1, 2)
box, sside = self.get_mini_boxes(box)
if sside < self.min_size + 2:
    continue
box = np.array(box)
···

问题是，self.unclip() 做了很多多余的操作。

def unclip(self, box):
    unclip_ratio = self.unclip_ratio
    poly = Polygon(box)
    distance = poly.area * unclip_ratio / poly.length
    offset = pyclipper.PyclipperOffset()
    offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
    expanded = np.array(offset.Execute(distance))
    return expanded

本来只需要简单将四个顶点向外移动，当前的代码却选择借助多边形进行扩展，然后再次使用 self.get_mini_boxes() 从多边形变回四个顶点。很绕。

解决方法

self.get_mini_boxes() 里有将 box 顶点按照第一个为左上顶点且顺时针排序的规则排序的逻辑，所以传入到 self.unclip() 的 box顶点顺序是已知的。如此，可将 self.unclip() 改写为如下形式：

def unclip(self, box):
    area = cv2.contourArea(box)
    perimeter = cv2.arcLength(box, True)
    distance = area * self.unclip_ratio / perimeter
    signs = np.array([[1, -1], [-1, -1], [-1, 1], [1, 1]])
    expanded = box + distance * signs
    return expanded

调用时不再需要 self.get_mini_boxes()：

···
box = self.unclip(points)
-box, sside = self.get_mini_boxes(box)
-if sside < self.min_size + 2:
-    continue
box = np.array(box)
···

我自己测试了几张图片，这样的改动没有发生问题。两种代码的结果可以说是等价的。

pr 将四处所有用到 unclip 的地方进行了修改。

测试

修改前，

修改后，

SWHL · 2024-05-20T02:00:01Z

如果只考虑unlip的输入的box只为矩形框的话，您这样改应该没问题的。
如果输入是多边形的话，您这样感觉就有问题了吧

HiDolen · 2024-05-20T02:06:05Z

目前 self.boxes_from_bitmap 的处理流程如下：

使用 cv2.findContours() 从二值图获得边界
通过 self.get_mini_boxes() 使用 cv2.minAreaRect() 从边界获得框中心、长宽和角度，其结果送入 cv2.boxPoints() 转换为 4 个点坐标，然后按第一个点在左上角且顺时针排序
使用 self.box_score_fast() 计算所得框是否能较好覆盖 pred，丢弃分数低于 elf.box_thresh 的框
使用 self.unclip() 扩展框边界
再次使用 self.get_mini_boxes() 获得扩展后的框
结合原始图片大小，进行框的坐标映射，顺便将值截断到图片大小范围内

可以确定，输入到 unclip 函数的 box 已经是只有 4 个坐标的矩形框。没有多边形的可能。

SWHL · 2024-05-20T03:57:29Z

这个我下班再详细看看哈！你好认真呀

…

---- Replied Message ---- | From | ***@***.***> | | Date | 05/20/2024 10:06 | | To | RapidAI/RapidOCR ***@***.***> | | Cc | SWHL ***@***.***>, Comment ***@***.***> | | Subject | Re: [RapidAI/RapidOCR] Optimize the 'unclip' logic in DBPostProcess (PR #181) | 目前 self.boxes_from_bitmap 的处理流程如下：使用 cv2.findContours() 从二值图获得边界通过 self.get_mini_boxes() 使用 cv2.minAreaRect() 从边界获得框中心、长宽和角度，其结果送入 cv2.boxPoints() 转换为 4 个点坐标，然后按第一个点在左上角且顺时针排序使用 self.box_score_fast() 计算所得框是否能较好覆盖 pred，丢弃分数低于 elf.box_thresh 的框使用 self.unclip()扩展框边界再次使用 self.get_mini_boxes() 获得扩展后的框结合原始图片大小，进行框的坐标映射，顺便将值截断到图片大小范围内可以确定，输入到 unclip 函数的 box 已经是只有 4 个坐标的矩形框。没有多边形的可能。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

HiDolen · 2024-05-20T04:36:20Z

刚刚发现了修改后代码的致命错误：我没有测试文字倾斜的情况。

虽然针对这个问题修改了一下代码：

        def unclip(box):
            area = cv2.contourArea(box)
            perimeter = cv2.arcLength(box, True)
            distance = area * self.unclip_ratio / perimeter
            # signs = np.array([[-1, -1], [1, -1], [1, 1], [-1, 1]])
            # expanded = box + distance * signs

            unit_vectors = []
            for i in range(4):
                vector = box[(i + 1) % 4] - box[i]
                unit_vector = vector / np.linalg.norm(vector)
                unit_vectors.append(unit_vector)
            new_box = np.zeros_like(box)
            for i in range(4):
                new_box[i] = box[i] + unit_vectors[i - 1] * distance
                new_box[i] = new_box[i] - unit_vectors[i] * distance

            expanded = new_box
            return expanded.astype(np.float32)

此时能正确识别出倾斜文字，但发现准确度不如修改之前的版本。

修改前，

再次修改后，

“番剧” 二字没有识别成功。

我先关闭本 pr。

SWHL · 2024-05-20T04:48:33Z

严谨如你

HiDolen · 2024-05-20T07:13:54Z

写了个两方案可视化可视化。

import cv2
import numpy as np
from einops import rearrange
import pyclipper
from shapely.geometry import Polygon
import plotly.graph_objects as go

unclip_ratio = 1.6


def unclip_origin(box):
    poly = Polygon(box)
    distance = poly.area * unclip_ratio / poly.length
    offset = pyclipper.PyclipperOffset()
    offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
    expanded = np.array(offset.Execute(distance))
    ##########################
    bounding_box = cv2.minAreaRect(expanded)
    points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])

    index_1, index_2, index_3, index_4 = 0, 1, 2, 3
    if points[1][1] > points[0][1]:
        index_1 = 0
        index_4 = 1
    else:
        index_1 = 1
        index_4 = 0
    if points[3][1] > points[2][1]:
        index_2 = 2
        index_3 = 3
    else:
        index_2 = 3
        index_3 = 2

    box = [points[index_1], points[index_2], points[index_3], points[index_4]]
    expanded = np.array(box)
    ##########################
    return expanded


def unclip_2(box):
    area = cv2.contourArea(box)
    perimeter = cv2.arcLength(box, True)
    distance = area * unclip_ratio / perimeter

    unit_vectors = []
    for i in range(4):
        vector = box[(i + 1) % 4] - box[i]
        unit_vector = vector / np.linalg.norm(vector)
        unit_vectors.append(unit_vector)
    new_box = np.zeros_like(box)
    for i in range(4):
        new_box[i] = box[i] + unit_vectors[i - 1] * distance
        new_box[i] = new_box[i] - unit_vectors[i] * distance

    expanded = new_box
    return expanded.astype(np.float32)


def create_2d_trace(box, name, color):
    box_closed = np.concatenate([box, box[0:1]], axis=0)
    trace = go.Scatter(x=box_closed[:, 0], y=box_closed[:, 1], mode='lines', name=name, line=dict(color=color))
    return trace


def test_unclip_functions(box):
    # 计算unclip后的box
    unclipped_box_1 = unclip_origin(box)
    unclipped_box_2 = unclip_2(box)

    # 创建一个新的figure
    fig = go.Figure()

    # 绘制原始的box
    fig.add_trace(create_2d_trace(box, 'Original', 'blue'))

    fig.add_trace(create_2d_trace(unclipped_box_1, 'before', 'red'))
    fig.add_trace(create_2d_trace(unclipped_box_2, 'after', 'green'))

    fig.update_layout(
        autosize=False,
        xaxis=dict(
            scaleanchor='y',
            scaleratio=1,
        )
    )

    # 显示图
    fig.show()


box = np.array(
    [[834.6764, 613.2059], [871.58813, 646.35297], [864.7058, 646.82355], [827.79407, 624.67645]]
).astype(np.float32)
test_unclip_functions(box)

结果如图：

before 是现在的代码，after 是本 pr 的代码。我认为修改后的 unclip 结果才符合直觉。

但这样的修改确实会带来某些情况下识别率的差异。看 maintainers 大家的想法如何。

SWHL · 2024-05-20T10:20:36Z

优秀，待我有空仔细研究一下。感谢

SWHL · 2024-06-10T07:10:33Z

这个PR，我认真看了，暂时你的这边似乎也没有问题。我将番剧两字用两种unclip方法，切了出来，并处肉眼可见的差异。但是识别结果就是有差距。
原始unclip vs your unclip

猜测可能原因是文本检测模型在训练的时候，所有的数据均是采用现有unclip处理的。推理时保持一致，可以保证效果。

这个PR就让它开着吧！感谢您的贡献！

HiDolen added 5 commits May 18, 2024 12:27

Modify the way CTCLabelDecode calculates confidence

05f7c06

Merge branch 'RapidAI:main' into main

daeca30

Optimize the 'unclip' logic in DBPostProcess

437acaf

fix signs

a73c052

remove reshape for unclip

d737167

HiDolen closed this May 20, 2024

Fix the unclip

848832c

SWHL reopened this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the 'unclip' logic in DBPostProcess #181

Optimize the 'unclip' logic in DBPostProcess #181

HiDolen commented May 19, 2024 •

edited

SWHL commented May 20, 2024

HiDolen commented May 20, 2024

SWHL commented May 20, 2024 via email

HiDolen commented May 20, 2024

SWHL commented May 20, 2024

HiDolen commented May 20, 2024

SWHL commented May 20, 2024

SWHL commented Jun 10, 2024 •

edited

Optimize the 'unclip' logic in DBPostProcess #181

Are you sure you want to change the base?

Optimize the 'unclip' logic in DBPostProcess #181

Conversation

HiDolen commented May 19, 2024 • edited

当前问题

解决方法

测试

SWHL commented May 20, 2024

HiDolen commented May 20, 2024

SWHL commented May 20, 2024 via email

HiDolen commented May 20, 2024

SWHL commented May 20, 2024

HiDolen commented May 20, 2024

SWHL commented May 20, 2024

SWHL commented Jun 10, 2024 • edited

HiDolen commented May 19, 2024 •

edited

SWHL commented Jun 10, 2024 •

edited