Skip to content

Feature: improved and configurable threshold for text similarity #785

@zmbc

Description

@zmbc

Currently, it appears that if the source of two cells is not at least 70% the same, they will always be treated as separate cells and represented by a delete-cell, then add-cell operation. It feels like this threshold should be much lower, at least if whitespace-only lines are ignored.

There appears to be a relevant TODO:

# TODO: Add configuration framework
# TODO: Tune threshold with realistic sources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions