Introduction

In this problem,We have to train a model to identify unique patients in the sample dataset given. It is very important to remove data redundancies from data as carrying extra data will increse both space and time complexity of programs.As data of person can come by various sources,it is quiet possible that we can get data of same person with minimal difference in first name.

Programming Language

PYTHON

Libraries used

numpy
sklearn
pandas

How to run code :

Clone this complete project
Unzip and nevigate to project folder
Run NameClassification.py using any pyhton environment An output file named as type.csv will be generated.

Alternative using `jupyter notebook`

Open NameClassification.ipynb and execute complete cells

**NB.pdf file will describe different parts of code. **NB.ppt file will describe approch for the solution and algorithm used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Programming Language

Libraries used

How to run code :

Alternative using `jupyter notebook`

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Documentation		Documentation
data		data
.gitignore		.gitignore
NameClassification.ipynb		NameClassification.ipynb
NameClassification.py		NameClassification.py
README.md		README.md
type.csv		type.csv

Manishchhava/DataDeduplication

Folders and files

Latest commit

History

Repository files navigation

Introduction

Programming Language

Libraries used

How to run code :

Alternative using jupyter notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Alternative using `jupyter notebook`

Packages