-
Notifications
You must be signed in to change notification settings - Fork 0
/
Readme
62 lines (42 loc) · 2.09 KB
/
Readme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
## AI Agent for CSV File Monitoring and Processing
This project implements an AI agent that monitors a specified directory for changes to CSV files. The agent checks for data discrepancies such as duplicates and missing values, describes the columns using an AI model, and logs all relevant information.
## Features
- **File Monitoring**: Continuously monitors a specified directory for modifications to CSV files.
- **Data Discrepancy Detection**: Checks for duplicates and missing values in CSV files.
- **AI Descriptions**: Uses an AI model to describe columns and explain discrepancies in the data.
- **Automatic Cleanup**: Removes duplicate rows from CSV files.
- **Logging**: Logs all actions and AI-generated content to a log file and console.
## Setup and Installation
### Prerequisites
- Python 3.6+
- `pandas` library
- `watchdog` library
- `logging` module
- `google.generativeai` library
### Installation
1. Clone this repository or download the script file.
2. Install the required Python libraries:
```sh
pip install pandas watchdog google-generativeai
```
3. Replace the placeholder API key in the script with your actual OpenAI API key:
```python
genai.configure(api_key='YOUR_OPENAI_API_KEY')
```
## Usage
1. **Configuration**: Ensure the `path` variable in the script points to the directory you want to monitor:
```python
path = r"C:\Users\YourUsername\YourDirectory"
```
2. **Run the Script**:
```sh
python your_script_name.py
```
### Script Details
- **use_llama2_to_explain(discrepancy)**: Uses an AI model to generate explanations for data discrepancies.
- **describe_csv_columns(file_path)**: Uses an AI model to describe the columns of a CSV file, including sample values.
- **remove_duplicates(df, file_path)**: Removes duplicate rows from the DataFrame and updates the CSV file.
- **check_file_for_discrepancies(file_path)**: Checks for duplicates and missing values in the CSV file, describes the columns, and logs the results.
## Demo
Here's a demo video
https://drive.google.com/file/d/1fTQa7-edy2KjAJchnRHH1_20se0d2cbB/view?usp=drive_link