diff --git a/BERT Sentiment Analysis for SE/Readme.md b/BERT Sentiment Analysis for SE/Readme.md
new file mode 100644
index 000000000..18f2aad10
--- /dev/null
+++ b/BERT Sentiment Analysis for SE/Readme.md
@@ -0,0 +1,86 @@
+# BERT Sentiment Analysis model for Software Engineering Comments
+
+We have sentiment analysis model to analyze user reviews , chats , messages , comments , as well product reviews too . Generally the domain of analysis speaks about analysis of the sentiments of movies , people review or any product or service . Right now we dont have as such production models to speak about technical language sentiment analysis .Here in this problem statement , I created a BERT model to do sentiment analysis on the software engineering comments , which can help coders , developers as well site admins to look on the sentiment of the asked questions and here in ground truth lying behind .
+
+## What is BERT :
+
+BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Question Answering (SQuAD v1.1), Natural Language Inference (MNLI), and others.
+BERT’s key technical innovation is applying the bidirectional training of Transformer, a popular attention model, to language modelling. This is in contrast to previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training. The paper’s results show that a language model which is bidirectionally trained can have a deeper sense of language context and flow than single-direction language models. In the paper, the researchers detail a novel technique named Masked LM (MLM) which allows bidirectional training in models in which it was previously impossible.
+
+## Dataset :
+ i collected data from Stack over flow , Git hub , JIRA . I used resources from Kaggle and github to collect the CSV files an the raw text datas . Next i mereged the entire data in a proper structured categorical data format and saved inside the ./data folder . The data is dived into two formats ./data/Train.csv & ./data/Test.csv. The data is having comments from developers and its accompanied by the underneath sentiment.
+
+ ## Special Features of the model :
+
+ The special features of this project which speaks about the sake of doing it includes :
+ ```
+ a.) It is difficult to analyze the technical keywords and pass it into AI models for sentiment analysis
+ b.) If sites like Github , JIRA , Stack overflow have this power of sentiment nalaysis from this type of advanced model called BERT , then they can easily eleimante out spams ,
+ plagarism as well can detect which type of content is going quality . Also it will halp a lot in evaluating technlogies and tech stack based on the responses.
+ ```
+
+ ## How this model will work :
+
+ For training the BERT Model I am using [K train Library](https://pypi.org/project/ktrain/) which is a a fastai-like interface to Keras, that helps build and train Keras models with less time and coding. ktrain is open-source and available on GitHub [here](https://github.com/amaiya/ktrain/tree/master/ktrain).
+
+ To install ktrain, simply type the following:
+ ```
+ pip install ktrain
+ ```
+ To begin, let’s import the ktrain and ktrain.text modules:
+ ```
+ import ktrain
+from ktrain import text
+```
+Load the Data in the BERT model :
+```
+train_path="/content/Train.csv"
+test_path="/content/Test.csv"
+tr_path= pathlib.Path(train_path)
+te_path=pathlib.Path(test_path)
+if tr_path.exists ():
+ print("Train data path set.")
+else:
+ raise SystemExit("Train data path does not exist.")
+
+if te_path.exists ():
+ print("Test data path set.")
+else:
+ raise SystemExit("Test data path does not exist.")
+
+(x_train, y_train), (x_test, y_test), preproc = text.texts_from_array(train_df[:,2], train_df[:,1], x_test=test_df[:,2], y_test=test_df[:,1],maxlen=500, preprocess_mode='bert')
+```
+Load BERT and wrap it in a Learner object
+The first argument to get_learner uses the ktraintext_classifier function to load the pre-trained BERT model with a randomly initialized final Dense layer. The second and third arguments are the training and validation data, respectively. The last argument get_learner is the batch size. We use a small batch size of 6.
+```
+model = text.text_classifier('bert', (x_train, y_train) , preproc=preproc)
+learner = ktrain.get_learner(model,
+ train_data=(x_train, y_train),
+ val_data=(x_test, y_test),
+ batch_size=6)
+```
+Train the model
+To train the model, we use the fit_onecycle method of ktrain which employs a 1cycle learning rate policy that linearly increases the learning rate for the first half of training and then decreases the learning rate for the latter half:
+```
+learner.autofit(2e-5, early_stopping=5)
+```
+Plot the learning rate
+```
+learner.lr_plot()
+```
+Storing the model
+```
+model.save("model.h5")
+predictor = ktrain.get_predictor(learner.model, preproc)
+```
+
+## How to Run the script :
+The steps involved to run the script are as follows : (Specify all your data paths before run)
+```
+pip install -r requirements.txt
+python model.py
+```
+
+## Final Conclusion :
+
+The model is performing with near about to 86.7 % accuracy on the testing satge on the test data.
diff --git a/BERT Sentiment Analysis for SE/data/Test.csv b/BERT Sentiment Analysis for SE/data/Test.csv
new file mode 100644
index 000000000..6585ae97b
Binary files /dev/null and b/BERT Sentiment Analysis for SE/data/Test.csv differ
diff --git a/BERT Sentiment Analysis for SE/data/Train.csv b/BERT Sentiment Analysis for SE/data/Train.csv
new file mode 100644
index 000000000..34c749f38
Binary files /dev/null and b/BERT Sentiment Analysis for SE/data/Train.csv differ
diff --git a/BERT Sentiment Analysis for SE/model.py b/BERT Sentiment Analysis for SE/model.py
new file mode 100644
index 000000000..376866a24
--- /dev/null
+++ b/BERT Sentiment Analysis for SE/model.py
@@ -0,0 +1,83 @@
+#Installing Essential Requirements
+!pip install -r /content/requirements.txt
+!pip install ktrain
+#importing Packages
+import os
+import pathlib
+import pandas as pd
+from sklearn.metrics import classification_report
+os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
+os.environ["CUDA_VISIBLE_DEVICES"]="0";
+#Importing K Train
+import ktrain
+from ktrain import text
+# Data Loader
+train_path="/content/Train.csv"
+test_path="/content/Test.csv"
+tr_path= pathlib.Path(train_path)
+te_path=pathlib.Path(test_path)
+if tr_path.exists ():
+ print("Train data path set.")
+else:
+ raise SystemExit("Train data path does not exist.")
+
+if te_path.exists ():
+ print("Test data path set.")
+else:
+ raise SystemExit("Test data path does not exist.")
+
+train_df=pd.read_csv(train_path, encoding='utf-16', sep=';', header=None).values
+#train_df.head()
+test_df=pd.read_csv(test_path, encoding='utf-16', sep=';', header=None).values
+#test_df.head()
+(x_train, y_train), (x_test, y_test), preproc = text.texts_from_array(train_df[:,2], train_df[:,1], x_test=test_df[:,2], y_test=test_df[:,1],maxlen=500, preprocess_mode='bert')
+#Model Training
+model = text.text_classifier('bert', (x_train, y_train) , preproc=preproc)
+learner = ktrain.get_learner(model,
+ train_data=(x_train, y_train),
+ val_data=(x_test, y_test),
+ batch_size=6)
+learner.lr_plot()
+learner.autofit(2e-5, early_stopping=5)
+#model Evaluation
+model.save("model.h5")
+predictor = ktrain.get_predictor(learner.model, preproc)
+data=test_df[:,2].tolist()
+label=test_df[:,1].tolist()
+
+
+i=0
+correct=0
+wrong=0
+total=len(data)
+true_lab=[]
+pred_lab=[]
+text=[]
+for dt in data:
+ result=predictor.predict(dt)
+ if not result== label[i]:
+ text.append(dt)
+ pred_lab.append(result)
+ true_lab.append(label[i])
+ wrong+=1
+ else:
+ correct+=1
+
+ i+=1
+
+name_dict = {
+ 'Name': text,
+ 'Gold Label' : true_lab,
+ 'Predicted Label': pred_lab
+ }
+
+wrong_data= pd.DataFrame(name_dict)
+
+wrong_data.to_csv("wrong_results.csv", sep=';')
+
+names = ['negative', 'neutral', 'positive']
+y_pred = predictor.predict(data)
+y_true= test_df[1]
+print(classification_report(y_true, y_pred, target_names=names))
+
+print("Correct: ", correct,"/",total,"\nWrong: ", wrong,"/",total)
diff --git a/BERT Sentiment Analysis for SE/notebook/model.ipynb b/BERT Sentiment Analysis for SE/notebook/model.ipynb
new file mode 100644
index 000000000..b666f12af
--- /dev/null
+++ b/BERT Sentiment Analysis for SE/notebook/model.ipynb
@@ -0,0 +1,8848 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import pathlib\n",
+ "import pandas as pd\n",
+ "from sklearn.metrics import classification_report\n",
+ "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n",
+ "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"; "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Collecting ktrain\r\n",
+ " Downloading ktrain-0.14.7.tar.gz (25.2 MB)\r\n",
+ "\u001b[K |████████████████████████████████| 25.2 MB 5.0 MB/s \r\n",
+ "\u001b[?25hRequirement already satisfied: tensorflow==2.1.0 in /opt/conda/lib/python3.7/site-packages (from ktrain) (2.1.0)\r\n",
+ "Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/lib/python3.7/site-packages (from ktrain) (0.22.2.post1)\r\n",
+ "Requirement already satisfied: matplotlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from ktrain) (3.1.3)\r\n",
+ "Requirement already satisfied: pandas>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from ktrain) (1.0.1)\r\n",
+ "Requirement already satisfied: fastprogress>=0.1.21 in /opt/conda/lib/python3.7/site-packages (from ktrain) (0.2.3)\r\n",
+ "Collecting keras_bert>=0.81.0\r\n",
+ " Downloading keras-bert-0.81.0.tar.gz (29 kB)\r\n",
+ "Requirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from ktrain) (2.23.0)\r\n",
+ "Requirement already satisfied: joblib in /opt/conda/lib/python3.7/site-packages (from ktrain) (0.14.1)\r\n",
+ "Collecting langdetect\r\n",
+ " Downloading langdetect-1.0.8.tar.gz (981 kB)\r\n",
+ "\u001b[K |████████████████████████████████| 981 kB 44.5 MB/s \r\n",
+ "\u001b[?25hRequirement already satisfied: jieba in /opt/conda/lib/python3.7/site-packages (from ktrain) (0.42.1)\r\n",
+ "Collecting cchardet==2.1.5\r\n",
+ " Downloading cchardet-2.1.5-cp37-cp37m-manylinux1_x86_64.whl (241 kB)\r\n",
+ "\u001b[K |████████████████████████████████| 241 kB 46.9 MB/s \r\n",
+ "\u001b[?25hRequirement already satisfied: networkx>=2.3 in /opt/conda/lib/python3.7/site-packages (from ktrain) (2.4)\r\n",
+ "Requirement already satisfied: bokeh in /opt/conda/lib/python3.7/site-packages (from ktrain) (2.0.1)\r\n",
+ "Collecting seqeval\r\n",
+ " Downloading seqeval-0.0.12.tar.gz (21 kB)\r\n",
+ "Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from ktrain) (20.1)\r\n",
+ "Collecting tensorflow_datasets\r\n",
+ " Downloading tensorflow_datasets-3.1.0-py3-none-any.whl (3.3 MB)\r\n",
+ "\u001b[K |████████████████████████████████| 3.3 MB 38.1 MB/s \r\n",
+ "\u001b[?25hRequirement already satisfied: transformers>=2.7.0 in /opt/conda/lib/python3.7/site-packages (from ktrain) (2.8.0)\r\n",
+ "Requirement already satisfied: ipython in /opt/conda/lib/python3.7/site-packages (from ktrain) (7.12.0)\r\n",
+ "Collecting syntok\r\n",
+ " Downloading syntok-1.3.1.tar.gz (23 kB)\r\n",
+ "Collecting whoosh\r\n",
+ " Downloading Whoosh-2.7.4-py2.py3-none-any.whl (468 kB)\r\n",
+ "\u001b[K |████████████████████████████████| 468 kB 47.2 MB/s \r\n",
+ "\u001b[?25hRequirement already satisfied: google-pasta>=0.1.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (0.2.0)\r\n",
+ "Requirement already satisfied: scipy==1.4.1; python_version >= \"3\" in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.4.1)\r\n",
+ "Requirement already satisfied: numpy<2.0,>=1.16.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.18.1)\r\n",
+ "Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.1.0)\r\n",
+ "Requirement already satisfied: wrapt>=1.11.1 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.11.2)\r\n",
+ "Requirement already satisfied: six>=1.12.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.14.0)\r\n",
+ "Requirement already satisfied: keras-applications>=1.0.8 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.0.8)\r\n",
+ "Requirement already satisfied: astor>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (0.8.1)\r\n",
+ "Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (2.1.1)\r\n",
+ "Requirement already satisfied: grpcio>=1.8.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.28.1)\r\n",
+ "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (0.34.2)\r\n",
+ "Requirement already satisfied: protobuf>=3.8.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (3.11.3)\r\n",
+ "Requirement already satisfied: keras-preprocessing>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (1.1.0)\r\n",
+ "Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (3.2.1)\r\n",
+ "Requirement already satisfied: gast==0.2.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (0.2.2)\r\n",
+ "Requirement already satisfied: absl-py>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (0.9.0)\r\n",
+ "Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /opt/conda/lib/python3.7/site-packages (from tensorflow==2.1.0->ktrain) (2.1.0)\r\n",
+ "Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.0.0->ktrain) (2.8.1)\r\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.0.0->ktrain) (2.4.6)\r\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.0.0->ktrain) (1.1.0)\r\n",
+ "Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.0.0->ktrain) (0.10.0)\r\n",
+ "Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas>=1.0.1->ktrain) (2019.3)\r\n",
+ "Requirement already satisfied: Keras in /opt/conda/lib/python3.7/site-packages (from keras_bert>=0.81.0->ktrain) (2.3.1)\r\n",
+ "Collecting keras-transformer>=0.30.0\r\n",
+ " Downloading keras-transformer-0.33.0.tar.gz (11 kB)\r\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->ktrain) (2020.4.5.1)\r\n",
+ "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->ktrain) (1.25.7)\r\n",
+ "Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests->ktrain) (3.0.4)\r\n",
+ "Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->ktrain) (2.9)\r\n",
+ "Requirement already satisfied: decorator>=4.3.0 in /opt/conda/lib/python3.7/site-packages (from networkx>=2.3->ktrain) (4.4.1)\r\n",
+ "Requirement already satisfied: PyYAML>=3.10 in /opt/conda/lib/python3.7/site-packages (from bokeh->ktrain) (5.3)\r\n",
+ "Requirement already satisfied: typing-extensions>=3.7.4 in /opt/conda/lib/python3.7/site-packages (from bokeh->ktrain) (3.7.4.2)\r\n",
+ "Requirement already satisfied: pillow>=4.0 in /opt/conda/lib/python3.7/site-packages (from bokeh->ktrain) (5.4.1)\r\n",
+ "Requirement already satisfied: tornado>=5 in /opt/conda/lib/python3.7/site-packages (from bokeh->ktrain) (5.0.2)\r\n",
+ "Requirement already satisfied: Jinja2>=2.7 in /opt/conda/lib/python3.7/site-packages (from bokeh->ktrain) (2.11.1)\r\n",
+ "Requirement already satisfied: attrs>=18.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets->ktrain) (19.3.0)\r\n",
+ "Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets->ktrain) (4.43.0)\r\n",
+ "Requirement already satisfied: future in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets->ktrain) (0.18.2)\r\n",
+ "Collecting tensorflow-metadata\r\n",
+ " Downloading tensorflow_metadata-0.21.2-py2.py3-none-any.whl (31 kB)\r\n",
+ "Requirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets->ktrain) (0.3.1.1)\r\n",
+ "Requirement already satisfied: promise in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets->ktrain) (2.3)\r\n",
+ "Requirement already satisfied: boto3 in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (1.12.41)\r\n",
+ "Requirement already satisfied: sentencepiece in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (0.1.85)\r\n",
+ "Requirement already satisfied: sacremoses in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (0.0.41)\r\n",
+ "Requirement already satisfied: tokenizers==0.5.2 in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (0.5.2)\r\n",
+ "Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (2020.4.4)\r\n",
+ "Requirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from transformers>=2.7.0->ktrain) (3.0.10)\r\n",
+ "Requirement already satisfied: backcall in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (0.1.0)\r\n",
+ "Requirement already satisfied: setuptools>=18.5 in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (45.2.0.post20200209)\r\n",
+ "Requirement already satisfied: pygments in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (2.5.2)\r\n",
+ "Requirement already satisfied: pickleshare in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (0.7.5)\r\n",
+ "Requirement already satisfied: pexpect; sys_platform != \"win32\" in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (4.8.0)\r\n",
+ "Requirement already satisfied: jedi>=0.10 in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (0.14.1)\r\n",
+ "Requirement already satisfied: traitlets>=4.2 in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (4.3.3)\r\n",
+ "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from ipython->ktrain) (2.0.10)\r\n",
+ "Requirement already satisfied: h5py in /opt/conda/lib/python3.7/site-packages (from keras-applications>=1.0.8->tensorflow==2.1.0->ktrain) (2.10.0)\r\n",
+ "Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.2.1)\r\n",
+ "Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.11.2)\r\n",
+ "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.4.1)\r\n",
+ "Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.0.0)\r\n",
+ "Collecting keras-pos-embd>=0.10.0\r\n",
+ " Downloading keras-pos-embd-0.11.0.tar.gz (5.9 kB)\r\n",
+ "Collecting keras-multi-head>=0.22.0\r\n",
+ " Downloading keras-multi-head-0.22.0.tar.gz (12 kB)\r\n",
+ "Collecting keras-layer-normalization>=0.12.0\r\n",
+ " Downloading keras-layer-normalization-0.14.0.tar.gz (4.3 kB)\r\n",
+ "Collecting keras-position-wise-feed-forward>=0.5.0\r\n",
+ " Downloading keras-position-wise-feed-forward-0.6.0.tar.gz (4.4 kB)\r\n",
+ "Collecting keras-embed-sim>=0.7.0\r\n",
+ " Downloading keras-embed-sim-0.7.0.tar.gz (4.1 kB)\r\n",
+ "Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.7/site-packages (from Jinja2>=2.7->bokeh->ktrain) (1.1.1)\r\n",
+ "Requirement already satisfied: googleapis-common-protos in /opt/conda/lib/python3.7/site-packages (from tensorflow-metadata->tensorflow_datasets->ktrain) (1.51.0)\r\n",
+ "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/conda/lib/python3.7/site-packages (from boto3->transformers>=2.7.0->ktrain) (0.9.5)\r\n",
+ "Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from boto3->transformers>=2.7.0->ktrain) (0.3.3)\r\n",
+ "Requirement already satisfied: botocore<1.16.0,>=1.15.41 in /opt/conda/lib/python3.7/site-packages (from boto3->transformers>=2.7.0->ktrain) (1.15.41)\r\n",
+ "Requirement already satisfied: click in /opt/conda/lib/python3.7/site-packages (from sacremoses->transformers>=2.7.0->ktrain) (7.0)\r\n",
+ "Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.7/site-packages (from pexpect; sys_platform != \"win32\"->ipython->ktrain) (0.6.0)\r\n",
+ "Requirement already satisfied: parso>=0.5.0 in /opt/conda/lib/python3.7/site-packages (from jedi>=0.10->ipython->ktrain) (0.6.1)\r\n",
+ "Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.7/site-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0)\r\n",
+ "Requirement already satisfied: wcwidth in /opt/conda/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->ktrain) (0.1.8)\r\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.2.7)\r\n",
+ "Requirement already satisfied: rsa<4.1,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (4.0)\r\n",
+ "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.1.1)\r\n",
+ "Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.2.0)\r\n",
+ "Collecting keras-self-attention==0.41.0\r\n",
+ " Downloading keras-self-attention-0.41.0.tar.gz (9.3 kB)\r\n",
+ "Requirement already satisfied: docutils<0.16,>=0.10 in /opt/conda/lib/python3.7/site-packages (from botocore<1.16.0,>=1.15.41->boto3->transformers>=2.7.0->ktrain) (0.15.2)\r\n",
+ "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.4.8)\r\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.0.1)\r\n",
+ "Building wheels for collected packages: ktrain, keras-bert, langdetect, seqeval, syntok, keras-transformer, keras-pos-embd, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-self-attention\r\n",
+ " Building wheel for ktrain (setup.py) ... \u001b[?25l-\b \b\\\b \b|\b \b/\b \b-\b \b\\\b \b|\b \bdone\r\n",
+ "\u001b[?25h Created wheel for ktrain: filename=ktrain-0.14.7-py3-none-any.whl size=25240972 sha256=f8dbe678fed9583ab80ef777e773475b2c426341e83afe41680732cf971c9c6e\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/d5/f8/64/c482e2e11303d04d85af01b9b94ecfbeff8620be8f6e543e5f\r\n",
+ " Building wheel for keras-bert (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-bert: filename=keras_bert-0.81.0-py3-none-any.whl size=37912 sha256=8b1b84aa583874f1598c4b7fe5910a29bd253b4b8218c8604fc94315b968f4c0\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/fc/f6/94/9c54242cde921a3cdc7d049bae3f137d21fa28d3b8ccefd8a0\r\n",
+ " Building wheel for langdetect (setup.py) ... \u001b[?25l-\b \b\\\b \b|\b \b/\b \bdone\r\n",
+ "\u001b[?25h Created wheel for langdetect: filename=langdetect-1.0.8-py3-none-any.whl size=993191 sha256=3d7276487f7df018a7499449eb2522f804bd6a7754fdf6f5dcda6de6bce4a425\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/59/f6/9d/85068904dba861c0b9af74e286265a08da438748ee5ae56067\r\n",
+ " Building wheel for seqeval (setup.py) ... \u001b[?25l-\b \bdone\r\n",
+ "\u001b[?25h Created wheel for seqeval: filename=seqeval-0.0.12-py3-none-any.whl size=7423 sha256=f9eb2c7ab49c781064f5d5c1abb38a34ef29a709243ff561cbb569cc25e4465c\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/dc/cc/62/a3b81f92d35a80e39eb9b2a9d8b31abac54c02b21b2d466edc\r\n",
+ " Building wheel for syntok (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for syntok: filename=syntok-1.3.1-py3-none-any.whl size=20916 sha256=b4155e00098fc110e4344f64c71bc3c361f37d9759c20e946689a19f2d85f738\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/5e/c2/33/e5d7d8f2f8b0c391d76bf82b844c3151bf23a84d75d02b185f\r\n",
+ " Building wheel for keras-transformer (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-transformer: filename=keras_transformer-0.33.0-py3-none-any.whl size=13259 sha256=5601c994b2647009355dc03eb8afdb2d2688e36da07e547e40b58a38dd4cd7af\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/6a/d8/48/ad5dd5d184d38695ceb230091a11c954cb41f8be79169f5f25\r\n",
+ " Building wheel for keras-pos-embd (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-pos-embd: filename=keras_pos_embd-0.11.0-py3-none-any.whl size=7553 sha256=f0733de51d6a8ff5d9d611240dd52bc2d7ad7710654be190d2bbe161f08bb58d\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/65/66/e9/c7eafddc29b81a98786f12b48a2aee7e3c633f6bf4a62cbc20\r\n",
+ " Building wheel for keras-multi-head (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-multi-head: filename=keras_multi_head-0.22.0-py3-none-any.whl size=15373 sha256=9ba0631691390d03c89bbdef85fe468dcba556ce63df45de2032df60a028ea81\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/84/9a/24/906be267948ccf66cd40d415d710a263d5debd94e47b12d301\r\n",
+ " Building wheel for keras-layer-normalization (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-layer-normalization: filename=keras_layer_normalization-0.14.0-py3-none-any.whl size=5267 sha256=a2a5a2e40e981ea3fc65ca4a80d0f476aa48b09862ead5210b6cb584b9fed773\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/58/14/24/76b0d99b7d9cc17e110956e0fae825a5da3e70a60273220502\r\n",
+ " Building wheel for keras-position-wise-feed-forward (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-position-wise-feed-forward: filename=keras_position_wise_feed_forward-0.6.0-py3-none-any.whl size=5623 sha256=1a097fb15a1b5a765f9c636bfa6e00897ea2dd4a651a50197fb5b1a57b7b5421\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/9e/53/a2/651c985b605e6a6c48688c779808eb1956fabb910b0557d871\r\n",
+ " Building wheel for keras-embed-sim (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-embed-sim: filename=keras_embed_sim-0.7.0-py3-none-any.whl size=4674 sha256=7fdff4dc252d0c715ddfc4bbbdc46eb460a80621cab2934b6ae1dee24bcecd89\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/15/b0/a6/485a2a1484a5bb9d4593bd96e4e78ead78fa5ee51e6bd4ef3f\r\n",
+ " Building wheel for keras-self-attention (setup.py) ... \u001b[?25l-\b \b\\\b \bdone\r\n",
+ "\u001b[?25h Created wheel for keras-self-attention: filename=keras_self_attention-0.41.0-py3-none-any.whl size=17288 sha256=a8974b16ef4ab5edb90ce768c74dba1023c600eddae0ff5a768a09511e7919ac\r\n",
+ " Stored in directory: /root/.cache/pip/wheels/ff/90/b9/1f0da40d3e5796aeccb453dccd05035c1f34db3f0732d9b1a8\r\n",
+ "Successfully built ktrain keras-bert langdetect seqeval syntok keras-transformer keras-pos-embd keras-multi-head keras-layer-normalization keras-position-wise-feed-forward keras-embed-sim keras-self-attention\r\n",
+ "Installing collected packages: keras-pos-embd, keras-self-attention, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-transformer, keras-bert, langdetect, cchardet, seqeval, tensorflow-metadata, tensorflow-datasets, syntok, whoosh, ktrain\r\n",
+ "Successfully installed cchardet-2.1.5 keras-bert-0.81.0 keras-embed-sim-0.7.0 keras-layer-normalization-0.14.0 keras-multi-head-0.22.0 keras-pos-embd-0.11.0 keras-position-wise-feed-forward-0.6.0 keras-self-attention-0.41.0 keras-transformer-0.33.0 ktrain-0.14.7 langdetect-1.0.8 seqeval-0.0.12 syntok-1.3.1 tensorflow-datasets-3.1.0 tensorflow-metadata-0.21.2 whoosh-2.7.4\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install ktrain\n",
+ "import ktrain\n",
+ "from ktrain import text"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Train data path set.\n",
+ "Test data path set.\n"
+ ]
+ }
+ ],
+ "source": [
+ "#check if the paths for the input data is valid.\n",
+ "train_path=\"../Train.csv\"\n",
+ "test_path=\"../Test.csv\"\n",
+ "tr_path= pathlib.Path(train_path)\n",
+ "te_path=pathlib.Path(test_path)\n",
+ "if tr_path.exists ():\n",
+ " print(\"Train data path set.\")\n",
+ "else: \n",
+ " raise SystemExit(\"Train data path does not exist.\")\n",
+ " \n",
+ "if te_path.exists ():\n",
+ " print(\"Test data path set.\")\n",
+ "else: \n",
+ " raise SystemExit(\"Test data path does not exist.\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
2
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
t1
\n",
+ "
negative
\n",
+ "
Vineet, what you are trying to do is a terribl...
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
t2
\n",
+ "
positive
\n",
+ "
'Course I do, corrected.
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
t3
\n",
+ "
positive
\n",
+ "
Excellent, happy to help! If you don't mind, c...
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
t6
\n",
+ "
negative
\n",
+ "
@talnicolas I'm using it a few dozen times in ...
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
t7
\n",
+ "
neutral
\n",
+ "
I didn't select an answer because even though ...
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2\n",
+ "0 t1 negative Vineet, what you are trying to do is a terribl...\n",
+ "1 t2 positive 'Course I do, corrected.\n",
+ "2 t3 positive Excellent, happy to help! If you don't mind, c...\n",
+ "3 t6 negative @talnicolas I'm using it a few dozen times in ...\n",
+ "4 t7 neutral I didn't select an answer because even though ..."
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#showing the first 5 lines of the train data\n",
+ "train_df=pd.read_csv(train_path, encoding='utf-16', sep=';', header=None).values\n",
+ "#train_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "