Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature web parser with pattern #950

Open
wants to merge 7 commits into
base: Omega
Choose a base branch
from

Conversation

alpgul
Copy link

@alpgul alpgul commented Jan 31, 2025

Description

Implement WebStreamExtractor to support dynamic URL extraction from web sources in playlist loading. This includes:

  • New WebStreamExtractor utility class for extracting stream URLs
  • Updated PlaylistLoader to support '@' prefixed URLs with optional web pattern
  • Added CMakeLists.txt to include new source files
  • Added Windows build script for easier compilation

Missing Features

  • Missing header support for web requests(CFile.OpenFile function doesn't support web header. what alternatives can be used?)
  • In-depth web crawling is currently not supported.

Example Usage

  • usage with default pattern:
#EXTINF:-1 tvg-name="Test" , Test
@http://127.0.0.1:3000/index.html
  • usage with custom pattern:
#EXTINF:-1 tvg-name="Test1", Test1
#WEBPROP:web-regex="(https?://[^\"]+\.m3u8)"
@http://127.0.0.1:3000/index.html

URL must be specified in parentheses.

Implement WebStreamExtractor to support dynamic URL extraction from web sources in playlist loading. This includes:
- New WebStreamExtractor utility class for extracting stream URLs
- Updated PlaylistLoader to support '@' prefixed URLs with optional web pattern
- Added CMakeLists.txt to include new source files
- Added Windows build script for easier compilation
…port

Add support for #WEBREGEX: marker to specify custom web stream URL extraction patterns. Improve debug logging to include the web pattern used during extraction.
Modify WebStreamExtractor and PlaylistLoader to enhance URL extraction:
- Update regex matching to return the last matched group
- Add debug logging for default URL finding process
- Adjust marker parsing logic for more flexible group and web regex handling
@phunkyfish
Copy link
Member

phunkyfish commented Jan 31, 2025

Could you provide some examples? Preferably real world use cases.

The purpose of the feature is not clear to me from the PR description.

@alpgul
Copy link
Author

alpgul commented Jan 31, 2025

It is used to search for media URLs in HTML pages using a regex pattern.
Example usage: Define an m3u8 link in an HTML page. Then, define a regex to find the media URL link, and add the html link and regex as shown in the example usage above.
Example Html Page:

<html>
...
url:"https://localhost:3000/index.m3u8"
...
</html>

Example Regex:#WEBREGEX:url:"(https?://[^"]+\.m3u8)"

The current issue is that I can't send requests using custom headers. Which library should I use that works on all platforms? I'm considering using CURL, but does it work on all platforms?

Add support for #WEBHEADERS: marker to specify custom HTTP headers during web stream URL extraction. Modify WebStreamExtractor and PlaylistLoader to handle and apply custom headers when fetching stream URLs.
@alpgul
Copy link
Author

alpgul commented Feb 1, 2025

  • Added support for headers in web URL requests.

Example Usage

#EXTM3U
#EXTINF:-1 tvg-name="Test1", Test1
#WEBPROP:web-regex="([^"]+\.mp4)"
#WEBPROP:web-headers=user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36&referer:https://google.com/
@https://test-videos.co.uk/bigbuckbunny/mp4-h264

Enhance WebStreamExtractor to handle relative URLs by constructing full URLs when a relative path is detected. This ensures proper URL resolution for web stream sources with partial paths.
- Replace separate #WEBREGEX: and #WEBHEADERS: markers with a unified #WEBPROP: marker
- Update PlaylistLoader to store web-related properties directly on the channel
- Modify StreamUtils to extract web stream URLs using channel properties
- Change header parsing delimiter from '=' to ':' for more flexible header specification
- Add 'isWebUrl' and 'isMediaEntry' properties to improve stream URL extraction logic
- Add WebStreamExtractor overload for MediaEntry to support stream URL extraction
- Modify URL extraction to preserve original URL parameters
- Update logging to include more context during URL extraction
- Improve error handling by returning empty string for invalid URLs
- Remove redundant 'isMediaEntry' property parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants