Is there a way to do post processing of transcripts via LLM? #608

josancamon19 · 2024-08-16T20:36:14Z

Is your feature request related to a problem? Please describe.
Spent 2 hours attempting this, here are my findings.

Prompt attempted:

You are a helpful assistant for correcting transcriptions of recordings. You will be given a list of voice segments, each segment contains the fields (speaker id, text, and seconds [start, end])

The transcription has a Word Error Rate of about 15% in english, in other languages could be up to 25%, and it is specially bad at speaker diarization.

Your task is to improve the transcript by taking the following steps:

1. Make the conversation coherent, if someone reads it, it should be clear what the conversation is about, remember the estimate percentage of WER, this could include missing words, incorrectly transcribed words, missing connectors, punctuation signs, etc.

2. The speakers ids are most likely inaccurate, make sure to assign the correct speaker id to each segment, by understanding the whole conversation. For example, 
- The transcript could have 4 different speakers, but by analyzing the overall context, one can discover that it was only 2, and the speaker identification, took incorrectly multiple people.
- The transcript could have 1 single speaker, or 2, but in reality was 3.
- The speaker id might be assigned incorrectly, a conversation could have speaker 0 said "Hi, how are you", and then also speaker 0 said "I'm doing great, thanks for asking" which of course would be incorrect.

Considerations:
- Return a list of segments same size as the input.
- Do not change the order of the segments.

Transcript segments:
##

+ langchain parsing instructions.

Hypothesis was, if transcripts can be improved during memory creation, the context for future chat or proactivity will be much better.

This was an attempt of taking the transcript segments, and do post processing parsing.

Unfortunately LLM's are not accurate when the transcript becomes big, and remove stuff, or change the whole conversation, or add segments, remove them, etc.

Next steps to try:

Better prompting
Simpler fixes to the transcript, 1-5% better?
Try prompt with rawSegments, meaning the segments before combining them in the app. Have like (raw, combined, improved) segments in db

If this works, create a script that migrates existing transcripts in the db.

Findings:
Gpt4o outperforms all others.

The text was updated successfully, but these errors were encountered:

josancamon19 · 2024-08-16T20:38:19Z

The main goal of this was/is:

Improve transcript coherence
Improve diarization and speaker identification. is_user = True

josancamon19 · 2024-09-24T20:43:23Z

The answer is no, there's no way, so moved to not planned.

kodjima33 added this to omi TODO Aug 16, 2024

kodjima33 moved this to Backlog in omi TODO Aug 16, 2024

josancamon19 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2024

github-project-automation bot moved this to Done in omi TODO Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to do post processing of transcripts via LLM? #608

Is there a way to do post processing of transcripts via LLM? #608

josancamon19 commented Aug 16, 2024

josancamon19 commented Aug 16, 2024

josancamon19 commented Sep 24, 2024

Is there a way to do post processing of transcripts via LLM? #608

Is there a way to do post processing of transcripts via LLM? #608

Comments

josancamon19 commented Aug 16, 2024

josancamon19 commented Aug 16, 2024

josancamon19 commented Sep 24, 2024