Skip to content

Latest commit

 

History

History
43 lines (28 loc) · 2.09 KB

README.md

File metadata and controls

43 lines (28 loc) · 2.09 KB

IP Flow Analysis

Project Overview

The Project aim is to analyze IP network traffic flows to predict application layer protocol (specific application) such as Facebook, YouTube, and Instagram.

The dataset can be found here, the dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on the flow that we want to predict class.

For more details go to the project blog post

Motivation

Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).

Libraries used

  • keras 2.2.4+
  • sklearn 0.21.2+
  • numpy 1.16.4+
  • seaborn 0.9.0+
  • pandas 0.25.0+
  • matplotlib 3.1.0+

Files

  • /docs folder contain project blog doc and images
  • ip-flow-analysis.ipynb is the notebook where the analysis happen
  • model.h5 is the deep learning model can be generated from the notebook
  • Dataset-Unicauca-Version2-87Atts.csv is the dataset should be downloaded from here

Analysis Summary

The conclusion of our analysis is that we can identify the type of IP flow application with 66% accuracy, for more details go to the project blog post

Future Improvement

We can improve the model by

  • using more features that we have dropped
  • extract new features like (Is the flow for ingoing traffic or outgoing? Is the port is privileged or not?)
  • aggregate flows by connection

Acknowledgements

I would like to thank Juan Sebastián Rojas and Universidad Del Cauca for providing this dataset