Skip to content

Feat/optimize m3u parsing filtering#8

Merged
ted-gould merged 2 commits intomainfrom
feat/optimize-m3u-parsing-filtering
May 24, 2025
Merged

Feat/optimize m3u parsing filtering#8
ted-gould merged 2 commits intomainfrom
feat/optimize-m3u-parsing-filtering

Conversation

@ted-gould
Copy link
Owner

No description provided.

This commit introduces several optimizations to the M3U parsing and filtering logic, significantly improving performance and reducing memory allocations, especially for large M3U files and numerous filters.

The key changes include:

1.  **M3U Filtering (`src/m3u.go` - `FilterThisStream`):**
    *   Regular expressions used for matching filter rules are now pre-compiled globally at package initialization, avoiding repeated compilation during filtering.
    *   String lowercasing for case-insensitive filters is now performed more efficiently, reducing redundant operations for both filter rules and stream data.

2.  **M3U Parsing (`src/internal/m3u-parser/xteve_m3u_parser.go` - `MakeInterfaceFromM3U`):**
    *   Line filtering (e.g., removing comments and empty lines) within the `parseMetaData` function now appends valid lines to a new slice instead of using `slices.Delete` in a loop, which can be more efficient.
    *   The UUID/ID uniqueness check within `parseMetaData` now uses a map for O(1) average time complexity lookups, replacing a less efficient slice-based O(n) lookup.

**Benchmark Improvements:**

A new benchmark suite (`src/benchmark_m3u_test.go`) was created to measure parsing and filtering performance with various file sizes and filter counts.

Compared to the initial unoptimized state, the filtering performance has improved dramatically (8x-10x faster, with significantly fewer allocations). Parsing performance has also improved, particularly for larger M3U files (e.g., ~10% faster for the 'large' test case).

The overall performance improvement for combined parsing and filtering operations comfortably exceeds the 50% target.

The detailed benchmark results and comparisons are documented in `docs/benchmarks/m3u_performance.md`.
…ata.

This update addresses inaccuracies in M3U parsing benchmarks for medium and large datasets by:

1.  **Modifying `src/benchmark_m3u_test.go`:**
    *   I introduced a helper function `generateM3UContent` to dynamically create M3U content of specified sizes (number of entries and groups).
    *   I updated `BenchmarkParseM3U` and `BenchmarkFilterM3U` to use this dynamic generation for "medium" (1,000 entries) and "large" (10,000 entries) test cases. This replaces the previous reliance on file-based M3Us for these sizes, which were found to be incompletely populated.
    *   The "small" test case continues to use its existing, correctly populated file.

2.  **Updating `docs/benchmarks/m3u_performance.md`:**
    *   I added a new section with benchmark results obtained using the dynamically generated, fully populated M3U files.
    *   I included a comparative analysis against previous results, highlighting that the increased time/resource usage for medium/large parsing is due to processing more data, not a performance regression.
    *   I restructured the document to clearly show the progression of benchmark results: original (pre-optimization), optimized (with flawed M3U files), and corrected (optimized, with dynamic M3U generation).

These changes ensure that the benchmark results for M3U parsing and filtering are more accurate and representative of performance on large datasets, fulfilling your feedback on test data quality. The core optimizations previously implemented remain effective and their performance on realistic data is now better quantified.
@ted-gould ted-gould merged commit 3905ddc into main May 24, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant