The context required for a successful implementation of the solution could be:
- Warehouse to store raw and processed data. -> BlobStorage
- Event streaming to queue -> Kafka
- An ETL to pull the raw events into the warehouse.
- An ETL to process the data and load it into the database.
- REST API that can query the Database.
- Frontend that can query the API.
- AWS EC2: "cheap" virtual machine. Billed by time online.
- AWS Lambda: "cheap" function run when certain contitions are met. Billed by use cost and number of uses.
- AWS EKS + Fargate: expensive solution that:
- Can facilitate the creation of replicas to distribute the load...
- ...through a load balancer (i.e Ingress).
- No downtime with rolling release.
- Always enough resources with Fargate.
- Less initial investment would be EC2, since I could repeat my steps and focus on security/network issues.
- For a low and sparse number of requests, AWS Lambda could be cheaper. Higher initial investment (because I don't know how to do it).
- Highest initial investment that could require multiple engineers. High continuous expense.
All this options would also need a database, which could also be deployed on EC2 (cheaper) or on RDS (more expensive).
This is the easiest way I found to do it:
- Build API image.
- Create ECR for the project.
- Push images to ECR.
- Create 2 EC2 instances (
A
andB
), and install docker on both. - Pull API image in ECR into
A
. - Pull Postgres into EC2 instance
B
. - Spin up containers. API should be able to reach the DB, so the selection of the host is VERY important. Worked for me for with the public IP.
Easiest solution would be API Keys. To manage users and keys, it might be possible to use AWS Cognito.
- Isolated network for API and Database with one point of access to the API.
- Proper Database management to protect encrypt sensitive data.
- Enpoint to generate Oauth token using user credentials.
Poetry to install ALL deps in the virtualenv, since the requirements.txt file is limited to PRD libraries. Docker, because the tests will spin up a Postgres.
Docker to build the image. Use make build
. Image myapi
will be created.
terraform apply
to spin up API and Database. Will ask for variables.
terraform destroy
to destroy everything.
Once deployed access the endpoints at port 8000. Low effort frontend is the OpenAPI itself. Since the database will be empty, you will need to load the data using the endpoints tagged as ADMIN.
- Load parks info
- Select park name and load park energy readings.
- Repeat step 2 for as many parks as you want.
/parks
: list park info./parks/energy_readings
: list park info with readings. It's a join between the two tables./stats/parks
: show stats by park by date./stats/energy_types
: show stats by energy type by date.
Just used to load the data. Did it for myself, not for you :D. You're supposed to use the deployed AWS app!