You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Spent 2 hours attempting this, here are my findings.
Prompt attempted:
You are a helpful assistant for correcting transcriptions of recordings. You will be given a list of voice segments, each segment contains the fields (speaker id, text, and seconds [start, end])
The transcription has a Word Error Rate of about 15% in english, in other languages could be up to 25%, and it is specially bad at speaker diarization.
Your task is to improve the transcript by taking the following steps:
1. Make the conversation coherent, if someone reads it, it should be clear what the conversation is about, remember the estimate percentage of WER, this could include missing words, incorrectly transcribed words, missing connectors, punctuation signs, etc.
2. The speakers ids are most likely inaccurate, make sure to assign the correct speaker id to each segment, by understanding the whole conversation. For example,
- The transcript could have 4 different speakers, but by analyzing the overall context, one can discover that it was only 2, and the speaker identification, took incorrectly multiple people.
- The transcript could have 1 single speaker, or 2, but in reality was 3.
- The speaker id might be assigned incorrectly, a conversation could have speaker 0 said "Hi, how are you", and then also speaker 0 said "I'm doing great, thanks for asking" which of course would be incorrect.
Considerations:
- Return a list of segments same size as the input.
- Do not change the order of the segments.
Transcript segments:
##
+ langchain parsing instructions.
Hypothesis was, if transcripts can be improved during memory creation, the context for future chat or proactivity will be much better.
This was an attempt of taking the transcript segments, and do post processing parsing.
Unfortunately LLM's are not accurate when the transcript becomes big, and remove stuff, or change the whole conversation, or add segments, remove them, etc.
Next steps to try:
Better prompting
Simpler fixes to the transcript, 1-5% better?
Try prompt with rawSegments, meaning the segments before combining them in the app. Have like (raw, combined, improved) segments in db
If this works, create a script that migrates existing transcripts in the db.
Findings:
Gpt4o outperforms all others.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Spent 2 hours attempting this, here are my findings.
Prompt attempted:
Hypothesis was, if transcripts can be improved during memory creation, the context for future chat or proactivity will be much better.
This was an attempt of taking the transcript segments, and do post processing parsing.
Unfortunately LLM's are not accurate when the transcript becomes big, and remove stuff, or change the whole conversation, or add segments, remove them, etc.
Next steps to try:
If this works, create a script that migrates existing transcripts in the db.
Findings:
Gpt4o outperforms all others.
The text was updated successfully, but these errors were encountered: