Myanmar ner Tagging Model

Fine-tuned myanmar-pos-model for Myanmar NER tagging.

Training Results

Epoch Training Loss Validation Loss Precision Recall F1 Accuracy
1 1.5385 0.3730 0.5397 0.5068 0.5227 0.9175
2 0.2673 0.1809 0.7271 0.7958 0.7599 0.9481
3 0.1623 0.1295 0.7815 0.8408 0.8101 0.9637
4 0.1291 0.1015 0.7836 0.8602 0.8201 0.9710
5 0.0992 0.0965 0.8200 0.8943 0.8555 0.9719
6 0.0801 0.0879 0.8299 0.9019 0.8644 0.9738
7 0.0706 0.0819 0.8580 0.9137 0.8849 0.9765
8 0.0636 0.0768 0.8660 0.9148 0.8897 0.9780
9 0.0577 0.0757 0.8784 0.9202 0.8988 0.9784
10 0.0527 0.0760 0.8737 0.9125 0.8927 0.9791
11 0.0506 0.0785 0.8710 0.9236 0.8965 0.9775
12 0.0470 0.0754 0.8830 0.9225 0.9023 0.9794
13 0.0459 0.0754 0.8896 0.9231 0.9061 0.9802
14 0.0441 0.0813 0.8742 0.9274 0.9000 0.9779
15 0.0398 0.0763 0.8952 0.9247 0.9097 0.9812
16 0.0387 0.0841 0.8713 0.9252 0.8974 0.9779
17 0.0344 0.0805 0.8924 0.9258 0.9088 0.9805
18 0.0356 0.0790 0.8854 0.9279 0.9061 0.9802
19 0.0333 0.0801 0.8864 0.9249 0.9052 0.9806
20 0.0326 0.0788 0.8939 0.9254 0.9094 0.9817
21 0.0314 0.0801 0.8863 0.9263 0.9059 0.9808
22 0.0309 0.0815 0.8866 0.9267 0.9062 0.9806
23 0.0310 0.0825 0.8854 0.9281 0.9062 0.9804
24 0.0280 0.0828 0.8874 0.9272 0.9068 0.9807
25 0.0271 0.0826 0.8884 0.9276 0.9076 0.9809
26 0.0290 0.0828 0.8887 0.9272 0.9075 0.9807
27 0.0318 0.0835 0.8855 0.9256 0.9051 0.9803
28 0.0287 0.0837 0.8871 0.9267 0.9065 0.9805
29 0.0274 0.0837 0.8855 0.9272 0.9058 0.9804
30 0.0271 0.0832 0.8875 0.9267 0.9067 0.9806

Test Set Evaluation

Evaluated on myanmar-ner-dataset test split using seqeval metrics:

Entity Precision Recall F1-Score Support
DATE 0.80 0.86 0.83 251
LOC 0.93 0.96 0.95 2712
NUM 0.89 0.92 0.90 789
ORG 0.44 0.62 0.52 94
PER 0.84 0.88 0.86 533
TIME 0.62 0.70 0.66 57
micro avg 0.89 0.93 0.91 4436
macro avg 0.75 0.82 0.78 4436
weighted avg 0.89 0.93 0.91 4436

Training Details

Parameter Value
Base Model chuuhtetnaing/myanmar-pos-model
Total Epochs 30
Total Steps 510
Best Checkpoint checkpoint-255
Best F1 0.9097

Usage

from transformers import pipeline

ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", grouped_entities=True)
result = ner("α€€α€­α€―α€™α€±α€¬α€„α€Ία€žα€Šα€Ία€›α€”α€Ία€€α€―α€”α€Ία€™α€Όα€­α€―α€·α€žα€­α€―α€·α€žα€½α€¬α€Έα€žα€Šα€Ία‹")  # Ko Maung went to Yangon city
print(result)

Evaluation Code

!pip install seqeval

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from datasets import load_dataset
from tqdm import tqdm
from seqeval.metrics import classification_report

# Load model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("chuuhtetnaing/myanmar-ner-model")
tokenizer = AutoTokenizer.from_pretrained("chuuhtetnaing/myanmar-ner-model")

def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
    labels = []
    for i, label in enumerate(examples["ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            else:
                label_ids.append(-100)
            previous_word_idx = word_idx
        labels.append(label_ids)
    tokenized_inputs["labels"] = labels
    return tokenized_inputs

# Load and tokenize dataset
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", aggregation_strategy=None)
ds = load_dataset("chuuhtetnaing/myanmar-ner-dataset")
tokenized_ds = ds.map(tokenize_and_align_labels, batched=True)
test_ds = tokenized_ds["test"]

# Get label mapping
label_list = model.config.id2label

y_true = []
y_pred = []

for example in tqdm(test_ds):
    tokens = tokenizer.convert_ids_to_tokens(example["input_ids"])
    true_labels = [label_list[l] if l != -100 else "O" for l in example["labels"]]
    
    text = tokenizer.decode(example["input_ids"], skip_special_tokens=True)
    preds = ner(text)
    
    pred_labels = ["O"] * len(true_labels)
    for pred in preds:
        idx = pred["index"]
        if idx < len(pred_labels):
            pred_labels[idx] = pred["entity"]
    
    y_true.append([label_list[l] for l in example["labels"] if l != -100])
    y_pred.append([p for p, l in zip(pred_labels, example["labels"]) if l != -100])

print(classification_report(y_true, y_pred))

NER Labels

Tag Description
B-DATE Beginning of Date
I-DATE Inside Date
B-LOC Beginning of Location
I-LOC Inside Location
B-NUM Beginning of Number
I-NUM Inside Number
B-ORG Beginning of Organization
I-ORG Inside Organization
B-PER Beginning of Person
I-PER Inside Person
B-TIME Beginning of Time
I-TIME Inside Time
O Outside (Not an entity)
Downloads last month
31
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for chuuhtetnaing/myanmar-ner-model

Dataset used to train chuuhtetnaing/myanmar-ner-model

Space using chuuhtetnaing/myanmar-ner-model 1