Skip to content

Latest commit

 

History

History
140 lines (101 loc) · 4.52 KB

README.md

File metadata and controls

140 lines (101 loc) · 4.52 KB

🚀 PipeLogger Library 🚀

Simplify the generation and management of logs in your data pipelines.

GitHub stars PyPI version GitHub issues License


📖 What is PipeLogger?

PipeLogger is a library designed to standardize the creation of logs in data pipelines, providing a consistent format that facilitates problem identification and troubleshooting. With PipeLogger, you can manage detailed and structured logs, enabling more effective tracking of operations and deeper analysis of data ingestion processes.

🚀 Main features

  • Log standardization: PipeLogger creates detailed logs that follow a consistent format, making them easy to read and analyze.
  • Integration with Google Cloud Platform (GCP): Designed for pipelines deployed on GCP, supporting Cloud Functions and Cloud Run.
  • BigQuery Table Monitoring: Logs and monitors the size of BigQuery tables over time.
  • Storage in Google Cloud Storage: Automatically stores logs in a GCP bucket for centralized access and management.

🌟 Example of Log Generated

PipeLogger creates logs in a clear and structured JSON format as follows:

{
  "PipelineLogs": {
    "PipelineID": "Pipeline-Example",
    "Timestamp": "MM-DD-YY-THH:MM:SS",
    "Status": "Success",
    "Message": "Data uploaded successfully",
    "ExecutionTime": 20.5075738430023
  },
  "BigQueryLogs": [
    {
      "BigQueryID": "project.pipeline-example.table_1",
      "Size": 1555
    },
    {
      "BigQueryID": "project.pipeline-example.table_2",
      "Size": 3596
    }
  ],
  "Details": [
    {
      "additional_info": [
        "Data downloaded successfully",
        "Data processed successfully",
        "Data uploaded successfully"
      ]
    }
  ]
}

💻 Implementation

📋 Prerequisites

Before implementing PipeLogger, make sure you meet the following requirements:

  • The pipeline must be deployed on Google Cloud Platform (GCP), using Cloud Functions or Cloud Run.
  • The pipeline must interact with BigQuery tables.
  • A bucket on Google Cloud Storage is required to store the generated logs.

🛠️ How to Implement PipeLogger in your Pipeline

Follow the steps detailed in our Official Documentation to integrate PipeLogger into your pipeline projects.

🧑‍💻 Example of Basic Use

from pipelogger import logsformatter
import time

# Initialize the log formatter
logger = logsformatter(
    pipeline_id="Pipeline-Example",
    table_ids=["project.pipeline-example.table_1", "project.pipeline-example.table_2"],
    project_id="your-gcp-project-id",
    bucket_name="your-gcs-bucket",
    folder_bucket="logs_folder"
)

# Simulate pipeline execution
start_time = time.time()

# Simulation of pipeline operations....

# Generate and upload logs
logger.generate_the_logs(
    execution_status="Success",
    msg="Data uploaded successfully",
    start_timer=start_time,
    logs_details=["Process completed without errors."]
)

📦 Installation

You can easily install PipeLogger from PyPI using pip:

pip install pipelogger

📚 Complete Documentation

For complete details on implementation, advanced configuration and more usage examples, visit the Official Documentation.


🤝 Contribute

Contributions are welcome! If you have ideas, improvements or have found a bug, please open an issue or submit a pull request in our GitHub repository.


📄 License

This project is licensed under the terms of the MIT License.


📧 Contact

If you have any questions, feel free to contact us through our GitHub page or send us an email.