A Python program to retrieve the text transcript of YouTube videos. This tool uses the youtube_transcript_api
library to fetch available transcripts (both manually provided and auto-generated) for a given YouTube video URL.
- Prerequisites
- Python Script: Retrieve YouTube Video Transcript
- How to Use the Program
- Examples
- Handling Errors and Exceptions
- Additional Notes
- Contributions
Ensure you have Python 3.6 or later installed on your system. You can download Python from the official website.
Open your terminal or command prompt and install the necessary packages using pip
:
pip install youtube_transcript_api pytube
If you need transcripts in languages other than English, ensure that the video has transcripts available in those languages. The script can be adjusted to fetch transcripts in specific languages.
git clone https://github.com/ChristianE00/Youtube-Transcript-Fetcher.git
pip install youtube_transcript_api pytube
Open your terminal or command prompt, navigate to the directory containing youtube_transcript_fetcher.py, and execute the script using the following syntax:
python youtube_transcript_fetcher.py "YOUTUBE_VIDEO_URL" -o "output_filename.txt" -l "language_code"
"YOUTUBE_VIDEO_URL"
: Replace with the actual URL of the YouTube video."output_filename.txt"
: (Optional) Replace with your desired output filename. Defaults to transcript.txt if not specified."language_code"
: (Optional) Replace with the desired language code (e.g., 'en' for English, 'es' for Spanish). Defaults to 'en' if not specified.
python youtube_transcript_fetcher.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Description: This command fetches the English transcript of the provided video and saves it to transcript.txt
.
python youtube_transcript_fetcher.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -o "rickroll_transcript.txt" -l "en"
Description: This fetches the English transcript and saves it to rickroll_transcript.txt
.
Description: This command fetches the Spanish transcript of the specified video and saves it to transcript_es.txt
.
- Scenario: If a transcript isn't available in the specified language, the script will attempt to fetch the English transcript. If no transcript is found, it will notify you accordingly.
- Scenario: If transcripts are disabled for the video, the script will inform you that transcripts are disabled.
- Scenario: If the provided URL is invalid or the video ID cannot be extracted, the script will display an error message.
- Description: Some YouTube videos have multiple transcripts in different languages or both auto-generated and manually provided captions. The script lists all available transcripts before attempting to fetch the desired one. This helps you choose the correct language code.
- Description: Auto-generated transcripts might be less accurate compared to manually provided ones. This is indicated in the transcript listing output.
- Description: The youtube_transcript_api handles relatively long transcripts efficiently. However, if you encounter issues with exceptionally long videos, consider modifying the script to process segments or use more advanced transcript retrieval methods.
- Description: Ensure that you have the right to access and use the transcripts, especially for copyrighted content. Always respect YouTube's Terms of Service when accessing and using their data.
Feel free to contribute! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.