Automated news podcast consisting of AI-powered updates from various Zambian 🇿🇲 sources.
This is a tool that gathers news from various Zambian 🇿🇲 sources, summarizes the news items and presents the news as a podcast.
It consists primarily of two parts / components:
-
core -- this is primarily python code, where the following tasks are handled:
- gather the news using requests, feedparser and beautifulsoup4
- summarise the news using LLMs,
- create the podcast transcript,
- convert text to speech using AWS Polly,
- process the audio using ffmpeg, and
- generate content for the website.
The illustration below summarises this:
-
web -- this is an 11ty project, consisting of logic to build a static site for the podcast, including an RSS feed.
- I'm generally terrible at keeping up with current affairs
- I wanted to learn how to work with AI tools while solving a real problem
- I was inspired by Hackercast
- clone / fork the project
cd
into the project directory
Note
You need to have
docker
anddocker-compose
on your machine
On your machine:
-
you need to have poetry installed
-
create a python virtual environment
-
upgrade pip to latest version
pip install --upgrade pip
-
install dependencies
poetry install
-
update environment variables.
# copy .env.sample to .env cp -v .env.sample .env # Now you can update the relevant values in the .env file
-
build images and spin up docker containers
inv up --build
-
access the
app
containerinv exec app bash
In the container:
-
you can run tests
inv test
-
you can run the program
inv toolchain
See available invoke tasks with invoke -l
The project uses pgweb to help visualize database changes. You can access this in your browser at http://127.0.0.1:8081
This project uses Node v18. I recommend using fnm or volta to simplify managing Node.js versions on your machine.
-
install frontend dependencies
npm install
-
start the dev server, accessible at http://127.0.0.1:8080/
npm start
See other available scripts in package.json
.
The final outputs of this project are:
- mp3 files, hosted on AWS S3 (or similar platforms like Backblaze).
- a static site, which can be hosted anywhere. I use Cloudflare Pages, but you have various options such as GitGub Pages, Netlify, Vercel, Render, etc. You can even choose to host it on your own server.
Warning
Ensure that environment variables are updated accordingly for both core and web.
For a smooth, unattended setup, please follow these steps:
-
Set up a *nix machine (it can be your laptop, a VPS, etc.) with a Python virtual environment for the project, and make sure
docker
anddocker-compose
are installed. -
Configure a cron job on the machine to run the
cron.sh
script located in the repository root. This script will handle the automated generation and deployment process. -
Ensure that the machine has
git
properly configured. This is necessary for thecron.sh
script to push the generated content to the repository, triggering the build and deployment.
By following these steps, you can automate the deployment process and keep your project up to date without manual intervention.
Note
The
cron.sh
script uses apprise to notify the owner when a new episode is ready. You'll need to check the apprise docs on how to configure ntfy.sh or whatever apprise backend you choose.Feel free to adapt the deployment setup to your specific requirements and preferred hosting platforms.
This project follows the all-contributors specification. Contributions, issues and feature requests are most welcome! A good place to start is by helping out with the unchecked items in the TODO section of this README!
Feel free to check the issues page and take a look at the contributing guide before you get started.
To maintain code quality and formatting consistency, we utilize pre-commit hooks. These hooks automatically check and format your code before each commit. This helps ensure that the codebase remains clean and consistent throughout the development process. Set up the Git pre-commit hooks by running the following
pre-commit install && pre-commit install --hook-type commit-msg
See pre-commit-config.yaml
for more details. In addition, please note the following:
- if you're making code contributions, please try and write some tests to accompany your code, and ensure that the tests pass. Also, were necessary, update the docs so that they reflect your changes.
- your commit messages should follow the conventions described here. Write your commit message in the imperative: "Fix bug" and not "Fixed bug" or "Fixes bug". Once you are done, please create a pull request.
- Switch to Poetry
- Replace flake8, pycodestyle and isort with ruff
- Improve test coverage
- Create a More ways to listen button with a popup/modal so that people can choose multiple services
- Create a dynamic
og:image
with episode number & date - Keep things DRY. For example, the More ways to listen modal on the home and about pages, the header and footer icons.
- Toggle Dark/Light mode
- Improve the mobile UI. For example, the audio player controls
- Improve a11y. For instance, learn more about using the aria-current attribute
- Implement search on the web app
- Add a separate module for summarization backends so we can choose which one to work with
- Add more robust error handling on
requests
andfeedparser
jobs as well as all other operations, such as connecting to AWS Polly, etc. - Add task to perform substitution so that, for instance, K400 is written as 400 Kwacha. The AWS Polly voices fail to read Zambian money correctly.
- Cleanup the news by consolidating similar articles from different sources. In other words, let's make this DRY.
- Connect with social media platforms and automagically tweet, post to facebook when a new episode is out.
- Keep the background music running throughout the show
- Different background music for each day of the week
- Mention the weather in Lusaka, Livingstone, Kabwe, etc. Perhaps the weather forecast for the following day?
- Mention exchange rates
- Find a way to make a closing statement based on the news. Something like, "Don't forget to register yor sim card before the ZICTA deadline ..."
- Possibly allow for passing of an argument variable for the voice, or dynamically choose a voice from a list, just like the random intros and outros.
- Find a way of training the voice to learn how to pronounce Zambian words.
- Find a way to summarize for free, without relying on OpenAI's API. Perhaps train your own model, learn how to leverage tools like NLTK, spaCy, etc.
- Incorporate a newsletter version where the news is sent to your mailbox in a nice, clean format. People can subscribe / unsubscribe.
- Add Diamond TV as a news source. Might be a good idea to replace Muvi TV with Diamond TV because the latter seems to have infrequent updates. Also, we don't want too many news items -- it kills the whole point of this project -- to get the latest updates delivered in a concise manner.
- https://pixabay.com/music/beats-sweet-breeze-167504/
- https://pixabay.com/music/beats-aesthetic-beat-royalty-free-music-215851/
- https://pixabay.com/music/beats-digital-technology-131644/
- https://pixabay.com/music/beats-stellar-echoes-202315/
- https://pixabay.com/music/afrobeat-it-afrobeat-149308/
- logo adapted from https://www.pngrepo.com/svg/227923/news-reporter-woman