Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch CPEs from Repology #5352

Open
captn3m0 opened this issue Jun 25, 2024 · 7 comments
Open

Fetch CPEs from Repology #5352

captn3m0 opened this issue Jun 25, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@captn3m0
Copy link
Member

captn3m0 commented Jun 25, 2024

  1. Many products at endoflife.date already link to repology (see this search for results)
  2. Repology does a lot of the hard work, by using CVEs to link a repology product to its CPE. Recent CPEs are published, and while they aren't available on the API, they should be present in the data dumps. I've filed Expose CPE information in more places. repology/repology-webapp#249 to see if this is feasible upstream.

Here's a simpler TODO:

  1. Setup the repology dataset locally as per https://dumps.repology.org/
  2. Use the repology mapper CSV from below to find the repology identifier for each product.
  3. For each such product, write a query to fetch the corresponding CVE.
  4. Add the CPE back to the product as an identifier. From the CPE definition, we only really want cpe:<cpe_version>:<part>:<vendor>:<product>, since we are linking to the product, and not a specific version or variant.
@captn3m0 captn3m0 added the enhancement New feature or request label Jun 25, 2024
@endoflife-date endoflife-date deleted a comment from welcome bot Jun 25, 2024
@flo7000
Copy link
Contributor

flo7000 commented Jun 30, 2024

So I've looked more into it and i have some questions:

  • I've started with a ascript to get CPE's from the NVD API (https://services.nvd.nist.gov/rest/json/cpes), so that I don't have to do everything manually (and for 100% automating this "process" in the future). But when I do an API request with a keyword (python in that case) it gives back many results. Do you think, that implementing something that checks if the keyword is in the first three parts of the CPE and else ignore them would help?

Request URL: https://services.nvd.nist.gov/rest/json/cpes/2.0//?keywordSearch=python&keywordExactMatch

Count of CPEs found: 9121

  • For the repology identifiers, where can I find the "official" ones? because as I've seen Expose CPE information in more places. repology/repology-webapp#249 . That means that there isn't like a list yet? and is it correct that repology identifiers are just like the product names? (example: bun's repology is bun)

  • Same question as above, but for PURL; where do I find the official PURLs?

  • If I have many results, how do I know which is the right one? Or should I just everthing that I think is correct?

  • I have one last question in this comment: Is it possible to request only one field of "data" from the EOL.date API?

Example of what i mean (this is the infoblox WAPI)

$ddi_call "${ddi_api_url}/record:host?name~=ds&_return_fields=ipv4addrs,ipv6addrs,name,aliases,view,zone,ddns_protected,ttl,use_ttl,disable,comment,device_description&$pfix"

flo7000 added a commit to flo7000/endoflife.date that referenced this issue Jun 30, 2024
check endoflife-date#5352 for more information
@flo7000
Copy link
Contributor

flo7000 commented Jun 30, 2024

Added macOS CPE's for a test and to discuss with my colleague tomorrow how the KQL Queries will look like and other things

@captn3m0
Copy link
Member Author

captn3m0 commented Jul 1, 2024

Using the NVD dataset is a dead-end - I wouldn't go via that route. There's some attempts at making a PURL/CPE mapping, but none of them are as accurate as the approach the repology already takes: Use NVD CVEs (which link to CPEs), and if the same CVE is noticed on a product from another source (such as a distro CVE), link the CPE.

repology identifiers are just like the product names? (example: bun's repology is bun)

for many products on endoflife.date, we have a repology identifier in the metadata. Here's a complete mapping that you can use. the first column is the filename, and the second column is the corresponding repology project.

So apache-http-server.md links to https://repology.org/project/apache/information as an example.

product,repology
alibaba-dragonwell,jdk11-dragonwell-extended
alibaba-dragonwell,jdk11-dragonwell-standard
alibaba-dragonwell,jdk17-dragonwell
alibaba-dragonwell,jdk17-dragonwell-standard
alibaba-dragonwell,jdk8-dragonwell-extended
alibaba-dragonwell,jdk8-dragonwell-standard
amazon-cdk,aws-cdk
angularjs,angular.js
ansible,ansible
apache-airflow,apache-airflow
apache-flink,flink
apache-http-server,apache
argocd,argocd
bellsoft-liberica,liberica-jdk-11
bellsoft-liberica,liberica-jdk-11-full
bellsoft-liberica,liberica-jdk-11-lite
bellsoft-liberica,liberica-jdk-17-full
bellsoft-liberica,liberica-jdk-21-full
bellsoft-liberica,liberica-jdk-8-bin
bellsoft-liberica,liberica-jdk-8-full
bellsoft-liberica,liberica-jdk-full
bellsoft-liberica,liberica-jre-11
bootstrap,bootstrap
bun,bun
cert-manager,cert-manager
cfengine,cfengine
chef-infra-server,chef-infra-server
composer,php:composer
consul,consul
couchbase-server,couchbase-server-community
django,python:django
docker-engine,docker
docker-engine,docker-ce
drupal,php:drupal
drush,drush
elasticsearch,elasticsearch
envoy,envoy-proxy
etcd,etcd
exim,exim
fluent-bit,fluent-bit
go,go
goaccess,goaccess
gorilla,go:github-gorilla-context
gorilla,go:github-gorilla-csrf
gorilla,go:github-gorilla-css
gorilla,go:github-gorilla-handlers
gorilla,go:github-gorilla-mux
gorilla,go:github-gorilla-pat
gorilla,go:github-gorilla-rpc
gorilla,go:github-gorilla-schema
gorilla,go:github-gorilla-securecookie
gorilla,go:github-gorilla-sessions
gorilla,go:github-gorilla-websocket
gorilla,go:gorilla-context
gorilla,go:gorilla-mux
haproxy,haproxy
icinga-web,icingaweb2
isc-dhcp,dhcp
istio,go:istio-api
istio,go:istio-pkg
istio,istio
istio,istioctl
kotlin,kotlin
laravel,php:laravel-framework
mariadb,mariadb
matomo,matomo
meilisearch,meilisearch
memcached,memcached
mongodb,mongodb
mysql,mysql
nginx,nginx
nodejs,nodejs
phpmyadmin,phpmyadmin
postgresql,postgresql
prometheus,prometheus
puppet,puppet
python,python
redis,redis
ruby,ruby
salt,salt
squid,squid
tomcat,tomcat
traefik,traefik
typo3,typo3
unrealircd,unrealircd
varnish,varnish
vue,vue.js
wagtail,python:wagtail
wordpress,wordpress
yarn,yarn
zabbix,zabbix
zookeeper,zookeeper

Same question as above, but for PURL; where do I find the official PURLs?

We have them in the same markdown files under the identifiers key. See endoflife.date/contribute for what goes. I would however say PURLs are out-of-scope for this issue.

That said, there is no such thing as an official PURL. PURL are just an identifier that anyone can generate. In our case, we add them to our pages as and when we find them. See #5302 for an example, where I went through various installation methods of the product, generated PURLs for them, and added them to the product.

If I have many results, how do I know which is the right one? Or should I just everthing that I think is correct?

Assuming you use the repology dataset from https://dumps.repology.org/ (which is the point of this issue), this should not matter.

I have one last question in this comment: Is it possible to request only one field of "data" from the EOL.date API?

The API is light enough that I don't see the use for this. If you want to get data for multiple products in a single call, we don't support that in the current API, but the next version of the API has an endpoint for this: https://deploy-preview-2080--endoflife-date.netlify.app/api/v1/products/full/. See docs at https://deploy-preview-2080--endoflife-date.netlify.app/docs/api/v1/#/default/get_products_full_ and PR at #2080

@flo7000
Copy link
Contributor

flo7000 commented Jul 1, 2024

Using the NVD dataset is a dead-end - I wouldn't go via that route. There's some attempts at making a PURL/CPE mapping, but none of them are as accurate as the approach the repology already takes: Use NVD CVEs (which link to CPEs), and if the same CVE is noticed on a product from another source (such as a distro CVE), link the CPE.

This means I shouldn't use the API with a regex and instead use the API to look for CVE's and their linked CPE's? (see https://documenter.getpostman.com/view/16438573/UzXKWe99 )

for many products on endoflife.date, we have a repology identifier in the metadata. Here's a complete mapping that you can use. the first column is the filename, and the second column is the corresponding repology project.

So basically the repology identifier is just how the project is called on repology (if it exists)?

I would however say PURLs are out-of-scope for this issue.

So for now I should just focus on the repology and CPE identifiers?

Assuming you use the repology dataset from https://dumps.repology.org/ (which is the point of this issue), this should not matter.

This means that for the repology identifiers i can use this DB that I can import right? Do you know by any chance which DBMS it is?

The API is light enough that I don't see the use for this. If you want to get data for multiple products in a single call, we don't support that in the current API, but the next version of the API has an endpoint for this: https://deploy-preview-2080--endoflife-date.netlify.app/api/v1/products/full/. See docs at https://deploy-preview-2080--endoflife-date.netlify.app/docs/api/v1/#/default/get_products_full_ and PR at #2080

I meant like getting only one single field with an API call (like just get all identifiers), but thanks for the update about the API, this will be great !

@captn3m0
Copy link
Member Author

captn3m0 commented Jul 1, 2024

This means I shouldn't use the API with a regex and instead use the API to look for CVE's and their linked CPE's?

Yes, but repology has done this hard work, so we need not reproduce.

So basically the repology identifier is just how the project is called on repology (if it exists)?

Yes

So for now I should just focus on the repology and CPE identifiers?

Yes

Do you know by any chance which DBMS it is?

postgres, with a few extensions.

There's a README: https://dumps.repology.org/README.txt which links to https://github.com/repology/repology-updater#creating-the-database

I have a WIP dockerfile which might be of help:

FROM opensuse/leap:15.6

RUN zypper --non-interactive install postgresql14 postgresql14-libversion postgresql14-contrib zstd
RUN mkdir -p /var/run/postgresql && chown -R postgres /var/run/postgresql

USER postgres
ADD repology-database-dump-latest.sql.zst /tmp/repology-database-dump-latest.sql.zst
RUN /usr/lib/postgresql14/bin/initdb -D /var/lib/pgsql/data
RUN pg_ctl --wait --mode immediate -D /var/lib/pgsql/data start && \
	psql -c "CREATE DATABASE repology" && \
	psql -c "CREATE USER repology WITH PASSWORD 'repology'" && \
	psql -c "GRANT ALL ON DATABASE repology TO repology" && \
	psql --dbname repology -c "GRANT CREATE ON SCHEMA public TO PUBLIC" && \
	psql --dbname repology -c "CREATE EXTENSION pg_trgm" && \
	psql --dbname repology -c "CREATE EXTENSION libversion" && \
	echo "host    all             all             0.0.0.0/0            trust" >> /var/lib/pgsql/data/pg_hba.conf && \
	zstd -d /tmp/repology-database-dump-latest.sql.zst | psql --dbname repology -v ON_ERROR_STOP=1

CMD postgres -D /var/lib/pgsql/data
EXPOSE 5432

@captn3m0
Copy link
Member Author

captn3m0 commented Jul 1, 2024

The table we need is manual_cpes. I queried against all the known identifiers we had:

SELECT   effname as repology_identifier,  cpe_vendor ,  cpe_product  FROM manual_cpes wHERE effname IN ('kotlin', 'nginx', 'python:django', 'typo3', 'envoy-proxy', 'meilisearch', 'php:laravel-framework', 'prometheus', 'couchbase-server-community', 'php:drupal', 'php:composer', 'mongodb', 'postgresql', 'flink', 'chef-infra-server', 'puppet', 'python:wagtail', 'apache-airflow', 'jdk8-dragonwell-extended', 'jdk8-dragonwell-standard', 'jdk17-dragonwell', 'jdk17-dragonwell-standard', 'jdk11-dragonwell-extended', 'jdk11-dragonwell-standard', 'fluent-bit', 'liberica-jdk-11-full', 'liberica-jdk-11', 'liberica-jdk-11-lite', 'liberica-jdk-17-full', 'liberica-jdk-21-full', 'liberica-jdk-8-full', 'liberica-jdk-8-bin', 'liberica-jdk-full', 'liberica-jre-11', 'etcd', 'aws-cdk', 'dhcp', 'cfengine', 'varnish', 'zabbix', 'mysql', 'wordpress', 'elasticsearch', 'traefik', 'salt', 'matomo', 'ansible', 'exim', 'squid', 'goaccess', 'nodejs', 'istio', 'istioctl', 'go:istio-pkg', 'go:istio-api', 'redis', 'mariadb', 'docker', 'docker-ce', 'bun', 'vue.js', 'unrealircd', 'tomcat', 'cert-manager', 'python', 'go', 'ruby', 'drush', 'argocd', 'bootstrap', 'angular.js', 'consul', 'haproxy', 'yarn', 'icingaweb2', 'phpmyadmin', 'memcached', 'zookeeper', 'go:github-gorilla-context', 'go:github-gorilla-csrf', 'go:github-gorilla-css', 'go:github-gorilla-handlers', 'go:github-gorilla-mux', 'go:github-gorilla-pat', 'go:github-gorilla-rpc', 'go:github-gorilla-schema', 'go:github-gorilla-securecookie', 'go:github-gorilla-sessions', 'go:github-gorilla-websocket', 'go:gorilla-context', 'go:gorilla-mux', 'apache')

Which results in this table (with a filename column in case the repology_identifier is not the same as the filename:

repology_identifier cpe_vendor cpe_product filename
angular.js angularjs angular.js angular
ansible ansible ansible
ansible redhat ansible
ansible redhat ansible_engine
apache apache apache_http_server apache
apache apache http_server
apache-airflow apache airflow
argocd linuxfoundation argo-cd
bootstrap getbootstrap bootstrap
cfengine gnu cfengine
cfengine northern cfengine
cfengine northern.tech cfengine
consul hashicorp consul
couchbase-server-community couchbase couchbase_server couchbase-server
dhcp isc dhcp isc-dhcp
docker docker docker docker-engine
elasticsearch elastic elasticsearch
elasticsearch elasticsearch elasticsearch
envoy-proxy cncf envoy envoy-proxy
envoy-proxy envoyproxy envoy envoy-proxy
etcd etcd etcd
etcd redhat etcd
exim exim exim
exim university_of_cambridge exim
flink apache flink apache-flink
fluent-bit treasuredata fluent_bit
go golang go
go:github-gorilla-websocket gorillatoolkit websocket gorilla
haproxy haproxy haproxy
icingaweb2 icinga icinga_web_2 icinga-web
istio istio istio
kotlin jetbrains kotlin
mariadb mariadb mariadb
matomo matomo matomo
memcached memcached memcached
memcached memcachedb memcached
mongodb mongodb mongodb
mysql mysql mysql
mysql oracle mysql
mysql oracle mysql_server
nginx f5 nginx
nginx igor_sysoev nginx
nginx nginx nginx
nodejs nodejs node.js
nodejs nodejs nodejs
php:composer getcomposer composer composer
php:laravel-framework laravel laravel laravel
phpmyadmin phpmyadmin phpmyadmin
postgresql postgresql postgresql
prometheus prometheus prometheus
puppet puppet puppet
puppet puppetlabs puppet
python python python
python python_software_foundation python
python:django djangoproject django django
redis redis redis
redis redislabs redis
ruby ruby-lang ruby
ruby yukihiro_matsumoto ruby
salt saltstack salt
squid squid squid
squid squid-cache squid
tomcat apache tomcat
tomcat apache_software_foundation tomcat
tomcat apache_tomcat apache_tomcat
traefik containous traefik
traefik traefik traefik
typo3 typo3 typo3
varnish varnish-cache varnish
varnish varnish-cache varnish_cache
varnish varnish-software varnich_cache
varnish varnish-software varnich_cache
varnish varnish-software varnish_cache
varnish varnish_cache_project varnish_cache
wordpress wordpress wordpress
yarn yarnpkg yarn
zabbix zabbix zabbix
zookeeper apache zookeeper

Hopefully this should be good enough for you to file some more PRs. I've also updated the issue's description with a TODO.

@flo7000
Copy link
Contributor

flo7000 commented Jul 1, 2024

thanks for the examples, the files and the TODO's! i'm currently setting up the VM for the docker instance and i'll create the PR's soon :)

captn3m0 pushed a commit that referenced this issue Jul 2, 2024
check #5352 for more information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants