Telegram Chat Clustering

Directory Structure

The project is organized into the following directories:

data: Contains the datasets used for the analysis.
results: Stores the outputs and results of the analysis.
notebook: Includes Jupyter notebooks for different parts of the analysis.
util: Holds additional scripts used in the project.

Experiment Structure

The experiment itself is structured into two parts:

1. Data Exploration and Cleaning

This part starts with an initial exploration of the data. Once the errors and irregularities uncovered are fixed, a more in-depth exploratory analysis will be conducted.
The code for this part can be found in the 01_data_cleaning_and_exploration notebook.

2. Feature Engineering and Clustering

TBD
The code for this part will be provided in the 02_feature_engineering_and_clustering notebook.

Quickstart

Clone the repository:
```
git clone <repository-url>
```
Navigate to the project directory:
```
cd <project-directory>
```
Install the necessary dependencies for each notebook:

For 01_data_cleaning_and_exploration.ipynb:
```
pip install -r requirements.txt
```
For 02_feature_engineering_and_clustering.ipynb:
```
pip install -r requirements_2.txt
```
Using Python 3.12.4 and seperate environments for each notebook is recommended.
Add the data: Add datasets (either as csv or sqlite-db) to their respective directories (see below)
Run the notebooks: Open the Jupyter notebooks in the notebook directory to explore the data and run the analysis.

Adding the Data

Adding the Datasets

Due to privacy concerns, the datasets are not included in this repository. You can add the datasets in the following ways:
1. SQLite Databases: If the data is provided as SQLite databases created by the Telegram data-collection and analysis suite TeleVision, place these databases in the data/dbs directory.
2. CSV Files: If the data is already in CSV format, place these files in the data/csv directory.

Converting the Dataset to CSV

To convert all available SQLite databases in the data/dbs directory into CSV files, follow these steps:
1. Open a terminal or command prompt.
2. Navigate to the root directory of the project.
```
cd <project-directory>
```
3. Run the following command:
```
python3 -m util.export_msg_from_db
```
Please note that this process might take some time depending on the size of the databases. It also only works with dbs created using TeleVision.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
data		data
features		features
notebooks		notebooks
results		results
tmp		tmp
util		util
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements_2.txt		requirements_2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telegram Chat Clustering

Directory Structure

Experiment Structure

1. Data Exploration and Cleaning

2. Feature Engineering and Clustering

Quickstart

Adding the Data

Adding the Datasets

Converting the Dataset to CSV

About

Releases

Packages

Languages

andwzn/telegram-chat-clustering

Folders and files

Latest commit

History

Repository files navigation

Telegram Chat Clustering

Directory Structure

Experiment Structure

1. Data Exploration and Cleaning

2. Feature Engineering and Clustering

Quickstart

Adding the Data

Adding the Datasets

Converting the Dataset to CSV

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages