Caption Translator

A caption translator package based on open ai api
demo at streamlit: https://miblue119-caption-translator-streamlit-app-7s6eqn.streamlit.app/

Usage

install dependencies pip install -r requirements.txt
Set your openai api key with export OPENAI_API_KEYOPENAI="sxxxxxxxx"
Arguments
- Necessary
  - --file_path: Set the transcript file path. Current only support .vtt and .srt format
- Options
  - --text_engine: Set the open ai text engine. Default text engine is gpt-3.5-turbo
  - --language: Set the target translated language. Default language is japanese
    - support other language korean / german / traditional chinese / simplified chinese / french /dutch
      - please see the LANGUAGES definition at caption_translator/utils.py
  - --test: Whether to test part of content
  - --test_num: How many number of contents do you want to summarize?

Example

$python -m caption_translator.app --file_path ./examples/EP108_humanosis_Podcast.vtt --test --language ja

Resources

https://blog.devgenius.io/creating-meeting-minutes-using-openai-gpt-3-api-f79e5fc15eb1
https://blog.devgenius.io/counting-tokens-for-openai-gpt-3-api-59c8e0812eeb
Open AI's open source tokenizer tiktoken: https://github.com/openai/tiktoken
- Tokenization algorithm Byte Pair Encoding(1994 A New Algorithm for Data Compression) : https://zhuanlan.zhihu.com/p/424631681
  - data compression
- Open AI cookbook's example to use tiktoken: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynbhttps://huggingface.co/course/chapter6/5?fw=p
Hugging face Tokenizer
- BPE tokenization introduction: https://huggingface.co/course/chapter6/5?fw=pt
Process long input
Others
- https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences

Prompt

source: https://github.com/openai/openai-cookbook/blob/main/text_explanation_examples.md

Summarize the following text.

Text:
"""
Two independent experiments reported their results this morning at CERN, Europe's high-energy physics laboratory near Geneva in Switzerland. Both show convincing evidence of a new boson particle weighing around 125 gigaelectronvolts, which so far fits predictions of the Higgs previously made by theoretical physicists.

"As a layman I would say: 'I think we have it'. Would you agree?" Rolf-Dieter Heuer, CERN's director-general, asked the packed auditorium. The physicists assembled there burst into applause.
"""

Summary:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
caption_translator		caption_translator
docs/notebooks		docs/notebooks
examples		examples
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caption Translator

Usage

Resources

Prompt

About

Releases

Packages

Languages

MIBlue119/caption_translator

Folders and files

Latest commit

History

Repository files navigation

Caption Translator

Usage

Resources

Prompt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages