Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match-title/reject-title inconsistent with match-filter/documentation #9766

Open
9 of 10 tasks
maweki opened this issue Apr 23, 2024 · 1 comment
Open
9 of 10 tasks
Labels
docs/meta/cleanup related to docs, code cleanup, templates, devscripts etc wontfix This will not be worked on

Comments

@maweki
Copy link

maweki commented Apr 23, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

I am not sure whether this is a documentation issue or a bug, but it doesn't fit together this way:

  • Documentation states that --match-title REGEX is equivalent to --match-filter "title ~= (?i)REGEX"
  • Documentation states that --reject-title REGEX is equivalent to --match-filter "title !~= (?i)REGEX"
  • Documentation for --match-filters FILTER (what's the s doing there? It's not in the description. Typo?) states: "If used multiple times" there is a disjunction of the filters.

From the description I would expect to be able to use --match-title multiple times, but only the last instance is actually applied (see

yt-dlp/yt_dlp/options.py

Lines 596 to 599 in 89f535e

selection.add_option(
'--match-title',
dest='matchtitle', metavar='REGEX',
help=optparse.SUPPRESS_HELP)
and
matchtitle = self.params.get('matchtitle', False)
):

$ yt-dlp --match-title "test" --match-title "foo" https://www.youtube.com/watch?v=BaW_jenozKc
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading ios player API JSON
[youtube] BaW_jenozKc: Downloading android player API JSON
[youtube] BaW_jenozKc: Downloading m3u8 information
[download] "youtube-dl test video "'/\ä↭𝕐" title did not match pattern "foo"

But even if this were the case that multiple --match-title statements actually translate to multiple --match-filter statements, this then works unintuitively for the --reject-title case which probably does not work as a disjunction and would download the file if any of the patterns does not match the title (or alternatively, only rejects the video if all reject-patterns match).

I propose the following courses of action:

  1. As it is, the documentation should state that --reject-title and --match-title can only be used once each and are and-combined instead of or-combined (and that their behaviour is therefore, at least in combination, not equivalent to the stated --match-filter code).
  2. In an ideal world, all the --reject-title and --match-title arguments should be combined and in combination be converted into single --match-filter statement with one of the following semantics:
    • (DISJUNCTION --match-title) AND (CONJUNCTION --reject-titles), or
    • (CONJUNCTION --match-title) AND (CONJUNCTION --reject-titles).
    • I am not sure which one would be best. I think the first one is intuitive but it can be simulated by using multiple download calls with a single --match-title. But for multiple --reject-title appearances, the conjunction is the intuitive semantics here.

The first proposal has the virtue of solidifying what's actually been done in code. The second proposal provides an intuitive interface for title matching and would be more in correspondence to what's currently stated in the documentation.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--match-title', 'test', '--match-title', 'foo', 'https://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [ff0779267] (pip)
[debug] Python 3.9.2 (CPython x86_64 64bit) - Linux-5.10.0-28-amd64-x86_64-with-glibc2.31 (OpenSSL 1.1.1w  11 Sep 2023, glibc 2.31)
[debug] exe versions: ffmpeg 4.3.6-0, ffprobe 4.3.6-0
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.12.07, mutagen-1.45.1, requests-2.31.0, secretstorage-3.3.3, sqlite3-3.34.1, urllib3-2.0.7, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1810 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp)
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading ios player API JSON
[youtube] BaW_jenozKc: Downloading android player API JSON
[youtube] BaW_jenozKc: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[download] "youtube-dl test video "'/\ä↭𝕐" title did not match pattern "foo"
@maweki maweki added bug Bug that is not site-specific triage Untriaged issue labels Apr 23, 2024
@pukkandan
Copy link
Member

pukkandan commented Apr 23, 2024

The documentation states that --match-title is deprecated and is made redundant by the given --match-filter. It does not say that match-title can be used as an equivalent to match-filter. Any suggestions for how to clarify this in docs?

  1. As it is, the documentation should state that --reject-title and --match-title can only be used once each

It is a deprecated option - hence why it is not fully documented anymore. The docs instead give you the equivalent modern option which should preferably be used.

  • In an ideal world, all the --reject-title and --match-title arguments should be combined and in combination be converted into single --match-filter statement with one of the following semantics:

    • (DISJUNCTION --match-title) AND (CONJUNCTION --reject-titles), or
    • (CONJUNCTION --match-title) AND (CONJUNCTION --reject-titles).

I don't understand what you are saying here. --match-filter does already support both and and or operations. E.g. --match-filter A & B --match-filter C => (A&B)|C

@pukkandan pukkandan added docs/meta/cleanup related to docs, code cleanup, templates, devscripts etc wontfix This will not be worked on and removed bug Bug that is not site-specific triage Untriaged issue labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs/meta/cleanup related to docs, code cleanup, templates, devscripts etc wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants