BanglaBERT Fine-tuned for Bangla Sentiment Analysis

Model Description

This model is a fine-tuned version of csebuetnlp/banglabert on the SentiGOLD dataset for 5-class sentiment analysis in Bengali. It classifies text into:

😠 Very Negative (SN)
😞 Negative (WN)
😐 Neutral (N)
😊 Positive (WP)
😍 Very Positive (SP)

Key Features:

State-of-the-art Bangla language understanding
Handles both formal and informal Bengali text
Optimized for social media, reviews, and customer feedback
Requires text normalization using Bangla Normalizer

Intended Uses & Limitations

Primary Use

Sentiment analysis of Bengali text
Social media monitoring
Customer feedback analysis
Product review classification

Limitations

Performance may degrade on code-mixed text (Bengali-English)
May struggle with sarcasm and highly contextual expressions
Best for short to medium-length texts (up to 512 tokens)

Training Data

The model was fine-tuned on SentiGOLD, the largest gold-standard Bangla sentiment analysis dataset:

Feature	Value
Total Samples	70,000
Domains Covered	30+
Source Diversity	Social media, news, blogs, reviews
Class Distribution	Balanced across 5 classes
Annotation Quality	Fleiss' kappa = 0.88

Training Procedure

Hyperparameters

Parameter	Value
Learning Rate	2e-5 → 1.05e-6
Batch Size	48
Epochs	5
Optimizer	AdamW
Scheduler	ReduceLROnPlateau
Weight Decay	0.01
Gradient Accumulation	4 steps
Warmup Ratio	5%

Techniques

Class-weighted loss handling imbalance
Early stopping (patience=3)
Mixed precision (FP16) training
Gradient checkpointing
Text normalization using Bangla Normalizer

Evaluation Results

Validation Performance

Epoch	F1 (Macro)	Accuracy	Very Neg F1	Neg F1	Neu F1	Pos F1	Very Pos F1
1	0.6334	0.6331	0.6789	0.5834	0.6407	0.5635	0.7004
5	0.6537	0.6551	0.7081	0.6157	0.6421	0.5789	0.7236

Final Test Performance

Metric	Score
Macro F1	0.6660
Accuracy	0.6671

How to Use

Direct Inference

from transformers import pipeline
from normalizer import normalize

# Load model
classifier = pipeline(
    "text-classification", 
    model="ahs95/banglabert-sentiment-analysis",
    tokenizer="ahs95/banglabert-sentiment-analysis"
)

# Prepare text
text = "আপনার পণ্যটি অসাধারণ! আমি খুবই সন্তুষ্ট।"
normalized_text = normalize(text)  # Important for BanglaBERT

# Classify
result = classifier(normalized_text)
print(f"Sentiment: {result[0]['label']} (Confidence: {result[0]['score']:.2f})")

Advanced Usage


from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from normalizer import normalize

# Load model and tokenizer
model_name = "ahs95/banglabert-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare inputs
texts = [
    "সেবা খুব খারাপ ছিল। আমি কখনো ফিরে আসব না।",
    "পণ্যটির গুণগত মান মোটামুটি ভাল"
]
normalized_texts = [normalize(t) for t in texts]

# Tokenize and predict
inputs = tokenizer(normalized_texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get predictions
sentiment_labels = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
predictions = [sentiment_labels[p] for p in probabilities.argmax(dim=1)]

for text, pred in zip(texts, predictions):
    print(f"Text: {text}\nPredicted Sentiment: {pred}\n")

Ethical Considerations

Bias: While SentiGOLD reduces bias through synthetic data, real-world validation is recommended
Use Cases: Suitable for:
- Product feedback analysis
- Social media monitoring
- Market research
- Avoid: Critical decision systems without human oversight

Citation

If you use this model, please cite:

@misc{banglabert-sentiment,
  author = {Arshadul Hoque},
  title = {Fine-tuned BanglaBERT for Bengali Sentiment Analysis},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ahs95/banglabert-sentiment-analysis}}
}

Contact

For questions and support: ahsbd95@gmail.com

Downloads last month: 285

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ahs95/banglabert-sentiment-analysis

Base model

csebuetnlp/banglabert

Finetuned

(21)

this model

Dataset used to train ahs95/banglabert-sentiment-analysis

Spaces using ahs95/banglabert-sentiment-analysis 3

Paper for ahs95/banglabert-sentiment-analysis

SentiGOLD: A Large Bangla Gold Standard Multi-Domain Sentiment Analysis Dataset and its Evaluation

Paper • 2306.06147 • Published Jun 9, 2023