Github as a Growth Monitor

As part of my Insight Project, I built a data pipeline to monitor Github code eco-system and detect projects depending on vulnerable packages. It could be used by DevOps or SRE team to get when dependent packages have security vulnerability.

Introduction

Git-Monitor is a platform to enable organizations to leverage GitHub data and track internal package versions along with their dependencies. Recent events like EquiFax Data Breach where attackers exploited a vulnerable library being used by equifax to gain access to critical financial information of millions of people strengthens the importance of organizations to monitor the third party libraries, internal systems depend on.

Google Slides
Project Link

Approach

Project Structure

The directory structure for the repo is of following format :

      ├── README.md
      ├── execute.sh
      ├── Makefile
      ├── requirements.txt
      ├── src
      │   └──main.py
      │   └──credentials.py
      │   └──jobs
      │       └── create_project_nodes.py
      │       └── create_version_nodes.py
      │       └── create_dependencies.py
      │       └── database_operations.py
      ├── models
      |   └── project.py
      |   └── language.py
      |   └── license.py
      |   └── platform.py
      |   └── status.py
      |   └── version.py
      ├── tests
      ├── libs
      ├── utils
          └── util.py

src/main.py is the main driver of the application.
credentials.py is used to define NEO4J and AWS access credentials.
All the spark jobs are placed in src/jobs folder.
The /models folder hosts the different data models for neo4j.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Github as a Growth Monitor

Table of Contents

Introduction

Approach

Project Structure

Environment

Instructions to run the code

Future Work

Files

README.md

Latest commit

History

README.md

File metadata and controls

Github as a Growth Monitor

Table of Contents

Introduction

Approach

Project Structure

Environment

Instructions to run the code

Future Work