[ie/taptap] Add new extractor for taptap.cn #9776

c-basalt · 2024-04-24T07:33:50Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Add support for video extraction from taptap.cn posts and game topic pages

Fixes #9643

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

seproDev

Not a complete review, but more some initial feedback.

seproDev · 2024-05-09T12:19:32Z

yt_dlp/extractor/taptap.py

+ webpage = self._download_webpage(url, list_id)
+ nuxt_data = self._deserialize_nuxt_data(self._search_json(
+ r'<script[^>]+\bid=["\']__NUXT_DATA__["\'][^>]*>', webpage,
+ 'nuxt data', list_id, contains_pattern=r'\[(?s:.+)\]'))[1]
+ x_ua = traverse_obj(nuxt_data, (
+ 'state', '$sbff', ..., {lambda x: parse_qs(x)['X-UA']}, ...), get_all=False)


I think instead of parsing the hydration data it would be better to call the API directly
https://www.taptap.cn/webapiv2/app/v4/detail
https://www.taptap.cn/webapiv2/moment/v3/detail
I haven't looked too much in to the structure of the returned data, but it could make sense to put the _extract_video in to a base class and have separate extractors for moment/app content.

seproDev · 2024-05-09T12:25:39Z

yt_dlp/extractor/taptap.py

+ if '.m3u8' in video['url']:
+ video['formats'] = self._extract_m3u8_formats(video.pop('url'), video_id)


I would extract both the AVC and HEVC formats every time. I think that additional request is worth it, or is there any reason like excessive rate limiting not to do this?

yt_dlp/extractor/taptap.py

seproDev · 2024-05-09T12:28:12Z

yt_dlp/extractor/taptap.py

+
+
+class TapTapIE(InfoExtractor):
+ _VALID_URL = r'https?://www\.taptap\.cn/(?P<section>moment|app)/(?P<id>\d+)'


Could we also support taptap.io? The site structure is very similar. I think this should be doable with only minimal adjustments.

taptap extractor

a47fc80

seproDev added the site-request Request to support a new website label Apr 27, 2024

seproDev requested changes May 9, 2024

View reviewed changes

seproDev added the pending-fixes PR has had changes requested label May 9, 2024

c-basalt added 2 commits May 10, 2024 19:27

change to single quote

83e2e40

split extractors

5b35ca7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ie/taptap] Add new extractor for taptap.cn #9776

[ie/taptap] Add new extractor for taptap.cn #9776

c-basalt commented Apr 24, 2024

seproDev left a comment

seproDev May 9, 2024

seproDev May 9, 2024

seproDev May 9, 2024

		if '.m3u8' in video['url']:
		video['formats'] = self._extract_m3u8_formats(video.pop('url'), video_id)



		class TapTapIE(InfoExtractor):
		_VALID_URL = r'https?://www\.taptap\.cn/(?P<section>moment\|app)/(?P<id>\d+)'

[ie/taptap] Add new extractor for taptap.cn #9776

Are you sure you want to change the base?

[ie/taptap] Add new extractor for taptap.cn #9776

Conversation

c-basalt commented Apr 24, 2024

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

seproDev left a comment

Choose a reason for hiding this comment

seproDev May 9, 2024

Choose a reason for hiding this comment

seproDev May 9, 2024

Choose a reason for hiding this comment

seproDev May 9, 2024

Choose a reason for hiding this comment