arabic-wordcount

Using Arabic WordCount on a Dataset in Python

Main Objective

The repo is dedicated to show how we may utilize 'WordCount' module in Python to represent most repeated words in Arabic datasets & text in general, overcoming errors or problems that may occur as module isn't prepared to directly deal with Arabic. In order to properly use dataset then several steps of preprocessing were used.

Dataset

This dataset consists of 2386 reviews of products collected mainly in Arabic, with some reviews are written in English or Arabizi. Reviews are classified in 3 categories: Positive, Negative and Neutral.

Illustration of Steps

Needed modules were installed and dataset were imported
Dataset were splitted properly as it should be into two columns.
The sentences were tokenized into words and added to a list.
To avoid intervention of English, Arabizi and special characters, they were removed as a partial cleaning of the dataset.
We picked up the first 99 words, then created an instance of WordCount taking Arabic stopwords (imported from get_stop_words module) and Shorooq font as arguments.
To be represented properly in the plot, we reshaped the words and reversed their letters adding them to a list and finally converting it to Pandas series.
We generate the WordCount using Pandas series and plot the figure.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
Shorooq_N1.ttf		Shorooq_N1.ttf
Trial Arabic dataset for WordCount.csv		Trial Arabic dataset for WordCount.csv
arabic_WordCount.ipynb		arabic_WordCount.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arabic-wordcount

Using Arabic WordCount on a Dataset in Python

Main Objective

Dataset

Illustration of Steps

Code could be accessed through Google Colaboratory from here

About

Releases

Packages

Languages

moayadeldin/arabic-wordcount

Folders and files

Latest commit

History

Repository files navigation

arabic-wordcount

Using Arabic WordCount on a Dataset in Python

Main Objective

Dataset

Illustration of Steps

Code could be accessed through Google Colaboratory from here

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages