Skip to content

Commit

Permalink
docs: improve databases documentation (#7732)
Browse files Browse the repository at this point in the history
Signed-off-by: knqyf263 <[email protected]>
Co-authored-by: simar7 <[email protected]>
Co-authored-by: knqyf263 <[email protected]>
Co-authored-by: wkoot <[email protected]>
  • Loading branch information
4 people authored Nov 27, 2024
1 parent f5bdc79 commit 745be1a
Show file tree
Hide file tree
Showing 6 changed files with 281 additions and 260 deletions.
175 changes: 45 additions & 130 deletions docs/docs/advanced/air-gap.md
Original file line number Diff line number Diff line change
@@ -1,162 +1,77 @@
# Advanced Network Scenarios
# Connectivity and Network considerations

Trivy needs to connect to the internet occasionally in order to download relevant content. This document explains the network connectivity requirements of Trivy and setting up Trivy in particular scenarios.
Trivy requires internet connectivity in order to function normally. If your organizations blocks or restricts network traffic, that could prevent Trivy from working correctly.
This document explains Trivy's network connectivity requirements, and how to configure Trivy to work in restricted networks environments, including completely air-gapped environments.

## Network requirements
The following table lists all external resources that are required by Trivy:

Trivy's databases are distributed as OCI images via GitHub Container registry (GHCR):
External Resource | Feature | Details
--- | --- | ---
Vulnerability Database | Vulnerability scanning | [Trivy DB](../scanner/vulnerability.md)
Java Vulnerability Database | Java vulnerability scanning | [Trivy Java DB](../coverage/language/java.md)
Checks Bundle | Misconfigurations scanning | [Trivy Checks](../scanner/misconfiguration/check/builtin.md)
VEX Hub | VEX Hub | [VEX Hub](../supply-chain/vex/repo/#vex-hub)
Maven Central / Remote Repositories | Java vulnerability scanning | [Java Scanner/Remote Repositories](../coverage/language/java.md#remote-repositories)

- <https://ghcr.io/aquasecurity/trivy-db>
- <https://ghcr.io/aquasecurity/trivy-java-db>
- <https://ghcr.io/aquasecurity/trivy-checks>
!!! note
Trivy is an open source project that relies on public free infrastructure. In case of extreme load, you may encounter rate limiting when Trivy attempts to connect to external resources.

The following hosts are required in order to fetch them:
The rest of this document details each resource's connectivity requirements and network related considerations.

- `ghcr.io`
- `pkg-containers.githubusercontent.com`
## OCI Databases

The databases are pulled by Trivy using the [OCI Distribution](https://github.com/opencontainers/distribution-spec) specification, which is a simple HTTPS-based protocol.
Trivy's Vulnerability, Java, and Checks Bundle are packaged as OCI images and stored in public container registries.

[VEX Hub](https://github.com/aquasecurity/vexhub) is distributed from GitHub over HTTPS.
The following hosts are required in order to fetch it:
### Connectivity requirements

- `api.github.com`
- `codeload.github.com`

## Running Trivy in air-gapped environment

An air-gapped environment refers to situations where the network connectivity from the machine Trivy runs on is blocked or restricted.

In an air-gapped environment it is your responsibility to update the Trivy databases on a regular basis.

## Offline Mode

By default, Trivy will attempt to download latest databases. If it fails, the scan might fail. To avoid this behavior, you can tell Trivy to not attempt to download database files:

- `--skip-db-update` to skip updating the main vulnerability database.
- `--skip-java-db-update` to skip updating the Java vulnerability database.
- `--skip-check-update` to skip updating the misconfiguration database.

```shell
trivy image --skip-db-update --skip-java-db-update --offline-scan --skip-check-update myimage
```

## Self-Hosting

### OCI Databases

You can host the databases on your own local OCI registry.

First, make a copy of the databases in a container registry that is accessible to Trivy. The databases are in:
The specific registries and locations are detailed in the [databases document](../configuration/db.md).

- `ghcr.io/aquasecurity/trivy-db:2`
- `ghcr.io/aquasecurity/trivy-java-db:1`
- `ghcr.io/aquasecurity/trivy-checks:0`
Communication with OCI Registries follows the [OCI Distribution](https://github.com/opencontainers/distribution-spec) spec.

Then, tell Trivy to use the local registry:
The following hosts are known to be used by the default container registries:

```shell
trivy image \
--db-repository myregistry.local/trivy-db \
--java-db-repository myregistry.local/trivy-java-db \
--checks-bundle-repository myregistry.local/trivy-checks \
myimage
```
Registry | Hosts | Additional info
--- | --- | ---
Google Artifact Registry | <ul><li>`mirror.gcr.io`</li><li>`googlecode.l.googleusercontent.com`</li></ul> | [Google's IP addresses](https://support.google.com/a/answer/10026322?hl=en)
GitHub Container Registry | <ul><li>`ghcr.io`</li><li>`pkg-containers.githubusercontent.com`</li></ul> | [GitHub's IP addresses](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-githubs-ip-addresses)

#### Authentication
### Self-hosting

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).
You can host Trivy's databases in your own container registry. Please refer to [Self-hosting document](./self-hosting.md#oci-databases) for a detailed guide.

### VEX Hub
## Embedded Checks

You can host a copy of VEX Hub on your own internal server.
Checks Bundle is embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the database from the time of the Trivy release you are using.

First, make a copy of VEX Hub in a location that is accessible to Trivy.
## VEX Hub

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).
### Connectivity Requirements

Then, tell Trivy to use the local VEX Repository:
VEX Hub is hosted as at <https://github.com/aquasecurity/vexhub>.

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).
Trivy is fetching VEX Hub GitHub Repository directly using simple HTTPS requests.

#### Authentication
The following hosts are known to be used by GitHub's services:

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).

## Manual cache population

You can also download the databases files manually and surgically populate the Trivy cache directory with them.

### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files.

=== "Using Trivy"
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.

### Populating the Trivy Cache

In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

```shell
trivy -h | grep cache
```
- `api.github.com`
- `codeload.github.com`

For the example, we will assume the `TRIVY_CACHE_DIR` variable holds the cache location:
For more information about GitHub connectivity (including specific IP addresses), please refer to [GitHub's connectivity troubleshooting guide](https://docs.github.com/en/get-started/using-github/troubleshooting-connectivity-problems).

```shell
TRIVY_CACHE_DIR=/home/user/.cache/trivy
```
### Self-hosting

Put the Trivy DB files in the Trivy cache directory under a `db` subdirectory:
You can host a copy of VEX Hub on your own internal server. Please refer to the [self-hosting document](./self-hosting.md#vex-hub) for a detailed guide.

```shell
# ensure cache db directory exists
mkdir -p ${TRIVY_CACHE_DIR}/db
# copy the db files
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```
## Maven Central / Remote Repositories

### Java DB
Trivy might call out to Maven central or other remote repositories to fetch in order to correctly identify Java packages during a vulnerability scan.

For Java DB the process is the same, except for the following:
### Connectivity requirements

1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`
Trivy might attempt to connect (over HTTPS) to the following URLs:

## Misconfigurations scanning
- `https://repo.maven.apache.org/maven2`

Note that the misconfigurations checks bundle is also embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the Checks from the time of the Trivy release you are using.
### Offline mode

The misconfiguration scanner can be configured to load checks from a local directory, using the `--config-check` flag. In an air-gapped scenario you can copy the checks library from [Trivy checks repository](https://github.com/aquasecurity/trivy-checks) into a local directory, and load it with this flag. See more in the [Misconfiguration scanner documentation](../scanner/misconfiguration/index.md).
There's no way to leverage Maven Central in a network-restricted environment, but you can prevent Trivy from trying to connect to it by using the `--offline-scan` flag.
132 changes: 132 additions & 0 deletions docs/docs/advanced/self-hosting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Self-Hosting Trivy's Databases

This document explains how to host Trivy's [external dependencies](./air-gap.md) in your own infrastructure to prevent external network access. If you haven't already, please familiarize yourself with the [Databases document](../configuration/db.md) that explains about the different databases used by Trivy and the different configuration options that control them. This guide assumes you are already familiar with the concepts explained there.

## OCI databases

The following [Trivy Databases](../configuration/db.md) are packaged as OCI images:

- `trivy-db`
- `trivy-java-db`
- `trivy-checks`

To host these databases in your own infrastructure:

### Make a local copy

Use any container registry manipulation tool (e.g , [crane](https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane.md, [ORAS](https://oras.land), [regclient](https://github.com/regclient/regclient/tree/main)) to copy the images to your destination registry.

!!! note
You will need to keep the databases updated in order to maintain relevant scanning results over time.

### Configure Trivy

Use the appropriate [database location flags](../configuration/db.md#database-locations) to change the db-repository location:

- `--db-repository`
- `--java-db-repository`
- `--checks-bundle-repository`

### Authentication

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).

### OCI Media Types

When serving, proxying, or manipulating Trivy's databases, note that the media type of the OCI layer is not a standard container image type:

DB | Media Type | Reference
--- | --- | ---
`trivy-db` | `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip` | <https://github.com/aquasecurity/trivy-db/pkgs/container/trivy-db>
`trivy-java-db` | `application/vnd.aquasec.trivy.javadb.layer.v1.tar+gzip` | https://github.com/aquasecurity/trivy-java-db/pkgs/container/trivy-java-db
`trivy-checks` | `application/vnd.oci.image.manifest.v1+json` | https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks

## Manual cache population

Trivy uses a local cache directory to store the database files, as described in the [cache](../configuration/cache.md) document.
You can download the databases files and surgically populate the Trivy cache directory with them.

### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```


=== "Using Trivy"
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.

### Populating the Trivy Cache

In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

```shell
trivy -h | grep cache
```

For the example, we will assume the `TRIVY_CACHE_DIR` variable holds the cache location:

```shell
TRIVY_CACHE_DIR=/home/user/.cache/trivy
```

Put the Trivy DB files in the Trivy cache directory under a `db` subdirectory:

```shell
# ensure cache db directory exists
mkdir -p ${TRIVY_CACHE_DIR}/db
# copy the db files
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```

### Java DB adaptations

For Java DB the process is the same, except for the following:

1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`

## VEX Hub

### Make a local copy

To make a copy of VEX Hub in a location that is accessible to Trivy.

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).

### Configure Trivy

To configure Trivy to use the local VEX Repository:

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).

### Authentication

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).
Loading

0 comments on commit 745be1a

Please sign in to comment.