Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 2.29 KB

README.md

File metadata and controls

59 lines (48 loc) · 2.29 KB

Github as a Growth Monitor

As part of my Insight Project, I built a data pipeline to monitor Github code eco-system and detect projects depending on vulnerable packages. It could be used by DevOps or SRE team to get when dependent packages have security vulnerability.

Table of Contents

  1. Introduction
  2. Approach
  3. Project Structure
  4. Environment
  5. Run Instructions

Introduction

Git-Monitor is a platform to enable organizations to leverage GitHub data and track internal package versions along with their dependencies. Recent events like EquiFax Data Breach where attackers exploited a vulnerable library being used by equifax to gain access to critical financial information of millions of people strengthens the importance of organizations to monitor the third party libraries, internal systems depend on.

Google Slides
Project Link

Approach

Project Structure

The directory structure for the repo is of following format :

      ├── README.md
      ├── execute.sh
      ├── Makefile
      ├── requirements.txt
      ├── src
      │   └──main.py
      │   └──credentials.py
      │   └──jobs
      │       └── create_project_nodes.py
      │       └── create_version_nodes.py
      │       └── create_dependencies.py
      │       └── database_operations.py
      ├── models
      |   └── project.py
      |   └── language.py
      |   └── license.py
      |   └── platform.py
      |   └── status.py
      |   └── version.py
      ├── tests
      ├── libs
      ├── utils
          └── util.py

src/main.py is the main driver of the application.
credentials.py is used to define NEO4J and AWS access credentials.
All the spark jobs are placed in src/jobs folder.
The /models folder hosts the different data models for neo4j.

Environment

Instructions to run the code

Future Work