- Sentiment Analysis: Determine the sentiment of text (positive, negative, neutral).
- Emotion Detection: Identify emotions such as happiness, sadness, anger, etc.
- Zero-Shot Classification: Classify text into custom categories without additional training.
- Named Entity Recognition (NER): Extract entities like names, locations, and organizations from text.
- Sequence Classification: Fine-tune models for custom classification tasks.
- Token Classification: Classify tokens within text for tasks like NER.
- Sequence-to-Sequence (Seq2Seq): Perform tasks like translation and summarization.
- Model Comparison: Evaluate and compare multiple models on the same dataset.
- Explainability: Understand model predictions through feature importance analysis.
- Text Cleaning: Utilize utility functions for preprocessing text data.
- Sentiment Analysis
- Emotion Detection
- Zero-Shot Classification
- Named Entity Recognition (NER)
- Sequence Classification
- Token Classification
- Sequence-to-Sequence (Seq2Seq)
You can install the package via pip:
pip install textpredict
Initialize the TextPredict model and perform simple predictions:
from textpredict as initialize
# Initialize for sentiment analysis
# task : ["sentiment", "ner", "zeroshot", "emotion", "sequence_classification", "token_classification", "seq2seq" etc]
text = "I hate this product!" # ["I love this product!", "I hate this product!"]
model = initialize("sentiment")
result = model.analyze()
Utilize a specific pre-trained model from Hugging Face:
model = initialize("emotion", model_name="AnkitAI/reviews-roberta-base-sentiment-analysis", source="huggingface")
result = model.analyze(text)
Load and use a model from a local directory:
model = initialize("ner", model_name="./results", source="local")
result = model.analyze(text, return_probs=True) # if return_probs = True, It returns labels, score and probabilities
Train a model for sequence classification:
from textpredict import SequenceClassificationTrainer
from datasets import load_dataset
# Load dataset
train_data = load_dataset("imdb", split="train")
val_data = load_dataset("imdb", split="test")
# Initialize and train the model
trainer = SequenceClassificationTrainer(model_name="bert-base-uncased", output_dir="./results", train_dataset=train_data, val_dataset=val_data)
trainer.train()
trainer.save()
metrics = trainer.evaluate(test_dataset=val_data)
For detailed examples, refer to the examples
directory.
Understand model predictions with feature importance:
from textpredict import Explainability
explainer = Explainability(model_name="bert-base-uncased", task="sentiment", device="cpu")
importance = explainer.feature_importance(text)
and many more.... check documentations
For detailed documentation, please refer to the TextPredict Documentation.
Contributions are welcome! Please read our Contributing Guidelines before making a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
This project leverages the Transformers library by Hugging Face. We extend our gratitude to the Hugging Face team and to the developers, contributors for their work for their work in creating and maintaining such a valuable resource for the NLP community.
- GitHub Repository: Github
- PyPI Project: PYPI
- Documentation: TextPredict Documentation
- Source Code: Source Code
- Issue Tracker: Issue Tracker