Skip to content

In this problem,We have to train a model to identify unique patients in the sample dataset given. It is very important to remove data redundancies from data as carrying extra data will increse both space and time complexity of programs.As data of person can come by various sources,it is quiet possible that we can get data of same person with min…

Notifications You must be signed in to change notification settings

Manishchhava/DataDeduplication

Repository files navigation

Introduction

In this problem,We have to train a model to identify unique patients in the sample dataset given. It is very important to remove data redundancies from data as carrying extra data will increse both space and time complexity of programs.As data of person can come by various sources,it is quiet possible that we can get data of same person with minimal difference in first name.

Programming Language

PYTHON

Libraries used

  • numpy
  • sklearn
  • pandas

How to run code :

  1. Clone this complete project
  2. Unzip and nevigate to project folder
  3. Run NameClassification.py using any pyhton environment An output file named as type.csv will be generated.

Alternative using jupyter notebook

Open NameClassification.ipynb and execute complete cells

**NB.pdf file will describe different parts of code. **NB.ppt file will describe approch for the solution and algorithm used.

About

In this problem,We have to train a model to identify unique patients in the sample dataset given. It is very important to remove data redundancies from data as carrying extra data will increse both space and time complexity of programs.As data of person can come by various sources,it is quiet possible that we can get data of same person with min…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published