-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
we use crop the transcript badly (by decimating) #800
Comments
related: #679 |
I believe there is a bug related to retrieving information about the model's context window length and some logical calculations. For example, I used the web version of ChatGPT-4o, which is probably 8k, but I received a maxLength of only 900. I’m trying to understand how it works, but it’s quite complex. chatGPTBox/src/utils/crop-text.mjs Lines 31 to 101 in 0ee357d
@josStorer Perhaps we should simplify it by creating a new key, such as chatGPTBox/src/config/index.mjs Lines 166 to 268 in 0ee357d
|
Site note: I think there are two separate bugs? There's the one where it uses the transcript (I have a small example it above, the length AND contents are different), and the one you are investigating where it wrongly clips the transcript. |
The function |
By the way, if you want a temporary fix for only YouTube without considering the consequences, you can simply return the prompt here without the chatGPTBox/src/content-script/site-adapters/youtube/index.mjs Lines 56 to 59 in 0ee357d
|
Hmm let me look in the debugger... yes I see what you mean an extract from
and croppedText (after applying croptext)
If anyone would like to reproduce this, here are the full arguments to croptext
|
I guess the cropping is a wider issue, the ideal way to crop must not be to skip random parts of sentences, that would lead to incoherent text. It's to chunk the text (https://js.langchain.com/v0.1/docs/modules/data_connection/document_transformers/), perhaps keeping the beginning and end (and to tell the model that it's an incomplete text as well so it doesn't misrepresent this to the user). |
Describe the bug
问题描述
On youtube, this extension uses subtitles not transcript. The subtitles are terrible, and lead to the llm giving poor output
To Reproduce
It seems that this extension is using the subtitles, not the transcript. But the subtitles often have much poorer transcriber model and uncommon words are totally missed.
For example, for this video
Expected behavior
期望行为
This is part of the transcript available in the UI
And this is the subtitle information received in ChatGPTBox
As you can see it's a poor source of informaiton
Please complete the following information):
请补全以下内容
The text was updated successfully, but these errors were encountered: