Skip to content

Subset of correctly detected "match_text" returns "unknown-license-reference" #4388

Open
@chinyeungli

Description

@chinyeungli
    - score: '100.0'
      matcher: 2-aho
      end_line: 258
      rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_318.RULE
      from_file:
      start_line: 258
      matched_text: This is a dual license, in which the user may choose between either the
        GPL
      match_coverage: '100.0'
      matched_length: 2
      rule_relevance: 100
      rule_identifier: unknown-license-reference_318.RULE
      license_expression: unknown-license-reference
      license_expression_spdx: LicenseRef-scancode-unknown-license-reference
    - score: '28.0'
      matcher: 3-seq
      end_line: 259
      rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-2.0_25.RULE
      from_file:
      start_line: 258
      matched_text: |
        This is a dual license, in which the user may choose between either the GPL
        version 1 or the Artistic version 1 license.
      match_coverage: '31.11'
      matched_length: 14
      rule_relevance: 90
      rule_identifier: artistic-2.0_25.RULE
      license_expression: artistic-2.0
      license_expression_spdx: Artistic-2.0

The completed 2 lines text has license detected alright, but the same lines subset has unknown-license-reference detected.
The tool should be able to tell which lines have been scanned and avoid returning unknown-license-reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions