Skip to content

In this project we have cleaned and processed the data extracted from USCIS website, which includes all the details and information for US Visa applications from year 2011-2016. Then have created a data model based on the dataset and using which created a database in Neo4j, which is a graph database and best for problem-solving and analysis. The…

Notifications You must be signed in to change notification settings

Prathamesh-Verlekar/US-Permanent-Visas-Analysis

Repository files navigation

Big-Data-Architecture-and-Governance

Project Objective

  1. To clean and validate the data extracted from USCIS website
  2. Create a data model based on the dataset
  3. Create a database in Neo4j and load the data using Cypher queries
  4. Create a data pipeline for connecting Neo4j to Python
  5. Build an interactive dashboard for better insights
  6. Extract Metadata from Neo4j database and load it to SQL Server database
  7. Integration and Acceptance testing for data validation

Data Overview using Pandas Profiling

  1. This Dataset gives detailed information of around 374K visa applications and its decision.
  2. Data covers 2011-2016 and includes information on employer, position, wage offered, job posting history, employee education and past visa history, and final decision.
  3. we can analyze that the dataset has 374362 observations out of which 373025 are unique observations. The dataset has 154 variables out of which only 21 variables have more than 330000 non-missing observations.
  4. The Dataset has, 116 Categorical values 2 Date Time values 10 Numerical values 26 Boolean values

Technical Vision Diagram

Group_Vision_Diagram (4)

Graph Data Model

US Perm Visa Data Model

Database Schema in Neo4j

image

Interactive Dashboard

Full Dashboard Image

Target Audience

  1. US Citizenship and Immigration Services
  2. Corporates of different sectors
  3. Immigrants applying for US Visa

Dashboard Insights

  1. We found that H-1B is the top visa application that is applied through the different companies and has most approved visas.
  2. Amazon is amongst top 5 companies that file the highest number of visa applications.
  3. Computer Engineering is the hottest job for which companies are filling visa application and has highest rate of approval.
  4. India is the country with the most visa applications filed throughout the world and has the most approved cases.

About

In this project we have cleaned and processed the data extracted from USCIS website, which includes all the details and information for US Visa applications from year 2011-2016. Then have created a data model based on the dataset and using which created a database in Neo4j, which is a graph database and best for problem-solving and analysis. The…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages