Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript file metadata missing #55

Closed
peter-boucher opened this issue Jan 25, 2024 · 1 comment
Closed

Transcript file metadata missing #55

peter-boucher opened this issue Jan 25, 2024 · 1 comment

Comments

@peter-boucher
Copy link

When exporting a transcript of a conversation in Teams as a .vtt file some 'voice' metadata containing the speaker's screen name is present for each caption.

e.g.

WEBVTT

00:00:00.000 --> 00:00:00.800
<v Lisa Simpson>Knock knock</v>

00:00:02.100 --> 00:00:06.500
<v Homer Simpson>Who's there?</v>

00:00:10.530 --> 00:00:11.090
<v Lisa Simpson>Atish</v>

When I use webvtt to convert these captions to jsonl for analysis I'd like to preserve this metadata for context.

current output:

{"start": "00:00:00.000", "end": "00:00:00.800", "text": "Knock knock"}
{"start": "00:00:02.100", "end": "00:00:06.500", "text": "Who's there?"}
{"start": "00:00:10.530", "end": "00:00:11.090", "text": "Atish"}

desired output:

{"start": "00:00:00.000", "end": "00:00:00.800", "text": "Knock knock", "sender_name": "Lisa Simpson"}
{"start": "00:00:02.100", "end": "00:00:06.500", "text": "Who's there?", "sender_name": "Homer Simpson"}
{"start": "00:00:10.530", "end": "00:00:11.090", "text": "Atish", "sender_name": "Lisa Simpson"}

Sample code:

def vtt_to_jsonl(vtt_file, jsonl_file):
  captions = webvtt.read(vtt_file)

  with open(jsonl_file, 'w') as f:
    for caption in captions:
      caption_json = {
        'start': caption.start,
        'end': caption.end,
        'text': caption.text
        #'sender_name': caption.voice
      }
      json.dump(caption_json, f)
      f.write('\n')
glut23 added a commit that referenced this issue May 27, 2024
glut23 added a commit that referenced this issue May 27, 2024
@glut23
Copy link
Owner

glut23 commented May 30, 2024

Hi @peter-boucher version 0.5.1 adds support for this. Closing the issue. Thanks for raising it.

@glut23 glut23 closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants