Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ytdlp scrape comments and replies no longer work , whats the fixed code now? #9833

Closed
6 of 9 tasks
heyeanne34 opened this issue Apr 30, 2024 · 4 comments
Closed
6 of 9 tasks
Labels
duplicate This issue or pull request already exists question Question

Comments

@heyeanne34
Copy link

heyeanne34 commented Apr 30, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Please make sure the question is worded well enough to be understood

import json
import yt_dlp
import os
import re

def sanitize_filename(name, max_length=100):
"""
Sanitize the filename by removing disallowed characters and optionally truncating if necessary.
"""
name = re.sub(r'[\/*?:"<>|]', '', name)
name = name.replace(' ', '_')
if len(name) > max_length:
return name[:max_length]
return name

def download_comments_info_json(url: str, top_comments: str = 'all', replies: str = 'all'):
"""
Fetches comments (and replies, if specified) for the given video URL.
"""
max_comments = ['all'] if top_comments == 'all' else [top_comments, replies]
ydl_opts = {
'getcomments': True,
'skip_download': True,
'writesubtitles': True,
'writeautomaticsub': True,
'extractor_args': {
'youtube': {
'max_comments': max_comments,
'comment_sort': 'top',
}
},
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(url, download=False)
    comments_info = {'comments': info.get('comments', [])}
    video_title = info.get('title', 'video')
    return comments_info, video_title

def main(urls_file, base_directory):
"""
Reads video URLs from a file, downloads comments and replies for each, and saves them as JSON files named after the video titles.
"""
# Ensure the base directory exists
if not os.path.exists(base_directory):
os.makedirs(base_directory)

with open(urls_file, 'r') as file:
    urls = file.readlines()

for url in urls:
    url = url.strip()
    comments_info, video_title = download_comments_info_json(url)
    
    # Sanitize and prepare the filename
    filename = sanitize_filename(video_title) + ".json"
    output_path = os.path.join(base_directory, filename)
    
    # Save comments information to a JSON file
    with open(output_path, 'wt') as fh:
        json.dump(comments_info, fh, indent=2)
    print(f"Comments and replies saved successfully as {output_path}")

Paths

urls_file = 'C:\Users\v1\Documents\sample.txt'
base_directory = 'C:\Users\v1\Documents\saampleout'

Execute the main function

main(urls_file, base_directory)

this is the result C:\Users\v1\PycharmProjects\pythonProject.venv\Scripts\python.exe C:\Users\v1\PycharmProjects\pythonProject\comme.py
[youtube] Extracting URL: https://www.youtube.com/watch?v=YNechg4MqXI
[youtube] YNechg4MqXI: Downloading webpage
[youtube] YNechg4MqXI: Downloading ios player API JSON
[youtube] YNechg4MqXI: Downloading android player API JSON
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "YNechg4MqXI")
[youtube] YNechg4MqXI: Downloading player 7ee5b648
[youtube] YNechg4MqXI: Downloading m3u8 information
[info] YNechg4MqXI: Downloading subtitles: en-uYU-mmqFLq8
[youtube] Downloading comment section API JSON
[youtube] Downloading ~211 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/~211)
[youtube] Downloading comment API JSON page 2 (0/~211)
[youtube] Downloading comment API JSON page 3 (0/~211)
[youtube] Downloading comment API JSON page 4 (0/~211)
[youtube] Downloading comment API JSON page 5 (0/~211)
[youtube] Downloading comment API JSON page 6 (0/~211)
[youtube] Downloading comment API JSON page 7 (0/~211)
[youtube] Extracted 0 comments
Comments and replies saved successfully as C:\Users\v1\Documents\saampleout\George_Washington_statue_draped_in_Palestinian_flag_on_DC_campus.json
[youtube] Extracting URL: https://www.youtube.com/watch?v=Ip9I9tUZU1c
[youtube] Ip9I9tUZU1c: Downloading webpage
[youtube] Ip9I9tUZU1c: Downloading ios player API JSON
[youtube] Ip9I9tUZU1c: Downloading android player API JSON
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "Ip9I9tUZU1c")
[youtube] Ip9I9tUZU1c: Downloading m3u8 information
[info] Ip9I9tUZU1c: Downloading subtitles: en-uYU-mmqFLq8
[youtube] Downloading comment section API JSON
[youtube] Downloading ~802 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/~802)
[youtube] Downloading comment API JSON page 2 (0/~802)
[youtube] Downloading comment API JSON page 3 (0/~802)
[youtube] Downloading comment API JSON page 4 (0/~802)
[youtube] Downloading comment API JSON page 5 (0/~802)
[youtube] Downloading comment API JSON page 6 (0/~802)
[youtube] Downloading comment API JSON page 7 (0/~802)
[youtube] Downloading comment API JSON page 8 (0/~802)
[youtube] Downloading comment API JSON page 9 (0/~802)
[youtube] Downloading comment API JSON page 10 (0/~802)
[youtube] Downloading comment API JSON page 11 (0/~802)
[youtube] Downloading comment API JSON page 12 (0/~802)
[youtube] Downloading comment API JSON page 13 (0/~802)
[youtube] Downloading comment API JSON page 14 (0/~802)
[youtube] Downloading comment API JSON page 15 (0/~802)
[youtube] Downloading comment API JSON page 16 (0/~802)
[youtube] Downloading comment API JSON page 17 (0/~802)
[youtube] Downloading comment API JSON page 18 (0/~802)
[youtube] Downloading comment API JSON page 19 (0/~802)
[youtube] Downloading comment API JSON page 20 (0/~802)
[youtube] Downloading comment API JSON page 21 (0/~802)
[youtube] Downloading comment API JSON page 22 (0/~802)
[youtube] Downloading comment API JSON page 23 (0/~802)
[youtube] Extracted 0 comments
Comments and replies saved successfully as C:\Users\v1\Documents\saampleout\Columbia_protesters_smash_windows,hang'intifada'_banner.json

Process finished with exit code 0

it extracted 0 please help whats the updated code?

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

No response

@heyeanne34 heyeanne34 added the question Question label Apr 30, 2024
@heyeanne34
Copy link
Author

heyeanne34 commented Apr 30, 2024

here the whole code in txt,you will attach to path contains yt video links and path where the json files will be saved
code comments replies.txt

@pukkandan
Copy link
Member

#9775

@pukkandan pukkandan closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
@pukkandan pukkandan added the duplicate This issue or pull request already exists label Apr 30, 2024
@heyeanne34
Copy link
Author

heyeanne34 commented Apr 30, 2024

@pukkandan is there a new code? didnt quite understood the last reply in #9775

@pukkandan
Copy link
Member

The issue is being worked on. Subscribe to the linked PR to be notified when it's merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists question Question
Projects
None yet
Development

No branches or pull requests

2 participants