Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do i deal with SRT files containing HTML? #163

Open
keredson opened this issue Feb 15, 2018 · 2 comments
Open

how do i deal with SRT files containing HTML? #163

keredson opened this issue Feb 15, 2018 · 2 comments

Comments

@keredson
Copy link

example:

1
00:00:00,970 --> 00:00:03,000
<font face="Serif" size="18">Jellyfish at the Monterey Aquarium</font>

2
00:00:04,080 --> 00:00:06,080
<font face="Serif" size="18">Dude - get out of the way!</font>

3
00:00:09,350 --> 00:00:13,350
<font face="Serif" size="18">Shaky Hands...</font>

4
00:00:17,000 --> 00:00:22,000
<font face="Serif" size="18">Ah yes, this is better...</font>

5
00:00:24,825 --> 00:00:27,825
<font face="Serif" size="18">Pro Tip: Turn off the camera flash!</font>

6
00:00:33,000 --> 00:00:45,446
<font face="Serif" size="18">Thanks for watching and I hope you'll have fun with the VideoSub library!</font>

if i convert it to webvtt i get this:

WEBVTT

00:00.970 --> 00:03.000
&lt;font face="Serif" size="18">Jellyfish at the Monterey Aquarium&lt;/font>

00:04.080 --> 00:06.080
&lt;font face="Serif" size="18">Dude - get out of the way!&lt;/font>

00:09.350 --> 00:13.350
&lt;font face="Serif" size="18">Shaky Hands...&lt;/font>

00:17.000 --> 00:22.000
&lt;font face="Serif" size="18">Ah yes, this is better...&lt;/font>

00:24.825 --> 00:27.825
&lt;font face="Serif" size="18">Pro Tip: Turn off the camera flash!&lt;/font>

00:33.000 --> 00:45.446
&lt;font face="Serif" size="18">Thanks for watching and I hope you'll have fun with the VideoSub library!&lt;/font>

i'm converting like this:

      converter = pycaption.CaptionConverter()
      converter.read(srt, pycaption.detect_format(srt)())
      subtitles = converter.write(pycaption.WebVTTWriter())

thanks!

@kdHub
Copy link

kdHub commented Feb 15, 2018

This has been my solution so far post conversion... Also would be interested in resolution using pycaption


try:
    from HTMLParser import HTMLParser
except ImportError:
    # Python 3
    from html.parser import HTMLParser

# Store vtt convert
vtt=WebVTTWriter().write(DFXPReader().read(vtt))

h = HTMLParser()
vtt=(h.unescape(vtt))

@keredson
Copy link
Author

I did similar a work around but from the other end (preventing the escape to begin with).
https://github.com/keredson/gnomecast/blob/9bbb32ef3028dda480d893204aa71be7ea38ccaf/gnomecast.py#L19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants