Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better parsing of srt subtitles to remove double newlines/breaks #31

Open
shubhank008 opened this issue Jun 2, 2020 · 4 comments
Open

Comments

@shubhank008
Copy link

I am getting Malformed Exception in some of my srt files due to them having weird double line breaks which breaks your parser I think.
I tried fixing it by replacing 2 or 3 linebreaks with a single linebreak but it wasn't as accurate as regex or a proper approach would be, would appreciate if you can add it.

Example subtitle (part of it)

00:01:10.733 --> 00:01:12.272
Aren't you excited?

00:01:14.143 --> 00:01:17.942
Let's find another place 

to hide out this year,

and play video 
games until it blows over.

00:01:17.943 --> 00:01:19.942

That'll get us through half a day, no problem.
@shubhank008
Copy link
Author

Another example

10
00:02:05,988 --> 00:02:10,987
CHAPITRE 12

BAPTÊME ET PARADIS DES DIEUX

11
00:02:13,278 --> 00:02:14,367
Je vois…

12
00:02:14,488 --> 00:02:17,747
Tu vas arrêter de travailler
pour M. Benno ?

13
00:02:19,368 --> 00:02:21,497
Oui. J’en ai parlé à Otto.

@arqtiq
Copy link

arqtiq commented Jun 8, 2020

I'm also having this issue right now, torned between writing my own converter or pre-patching srt file to get rid of these line breaks

@shubhank008
Copy link
Author

I'm also having this issue right now, torned between writing my own converter or pre-patching srt file to get rid of these line breaks

I ended up writing a pre-patch to sanitize my srt files before reading them with webvtt, used a mix of both replace and regex to remove linebreaks and then keep on expanding that regex based on any other format mess I face

@kicks66
Copy link

kicks66 commented Apr 17, 2024

hi @shubhank008 - could you share your replace / regex that you used? running into the same issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants