Myanmar ner Tagging Model
Fine-tuned myanmar-pos-model for Myanmar NER tagging.
Training Results
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|
| 1 | 1.5385 | 0.3730 | 0.5397 | 0.5068 | 0.5227 | 0.9175 |
| 2 | 0.2673 | 0.1809 | 0.7271 | 0.7958 | 0.7599 | 0.9481 |
| 3 | 0.1623 | 0.1295 | 0.7815 | 0.8408 | 0.8101 | 0.9637 |
| 4 | 0.1291 | 0.1015 | 0.7836 | 0.8602 | 0.8201 | 0.9710 |
| 5 | 0.0992 | 0.0965 | 0.8200 | 0.8943 | 0.8555 | 0.9719 |
| 6 | 0.0801 | 0.0879 | 0.8299 | 0.9019 | 0.8644 | 0.9738 |
| 7 | 0.0706 | 0.0819 | 0.8580 | 0.9137 | 0.8849 | 0.9765 |
| 8 | 0.0636 | 0.0768 | 0.8660 | 0.9148 | 0.8897 | 0.9780 |
| 9 | 0.0577 | 0.0757 | 0.8784 | 0.9202 | 0.8988 | 0.9784 |
| 10 | 0.0527 | 0.0760 | 0.8737 | 0.9125 | 0.8927 | 0.9791 |
| 11 | 0.0506 | 0.0785 | 0.8710 | 0.9236 | 0.8965 | 0.9775 |
| 12 | 0.0470 | 0.0754 | 0.8830 | 0.9225 | 0.9023 | 0.9794 |
| 13 | 0.0459 | 0.0754 | 0.8896 | 0.9231 | 0.9061 | 0.9802 |
| 14 | 0.0441 | 0.0813 | 0.8742 | 0.9274 | 0.9000 | 0.9779 |
| 15 | 0.0398 | 0.0763 | 0.8952 | 0.9247 | 0.9097 | 0.9812 |
| 16 | 0.0387 | 0.0841 | 0.8713 | 0.9252 | 0.8974 | 0.9779 |
| 17 | 0.0344 | 0.0805 | 0.8924 | 0.9258 | 0.9088 | 0.9805 |
| 18 | 0.0356 | 0.0790 | 0.8854 | 0.9279 | 0.9061 | 0.9802 |
| 19 | 0.0333 | 0.0801 | 0.8864 | 0.9249 | 0.9052 | 0.9806 |
| 20 | 0.0326 | 0.0788 | 0.8939 | 0.9254 | 0.9094 | 0.9817 |
| 21 | 0.0314 | 0.0801 | 0.8863 | 0.9263 | 0.9059 | 0.9808 |
| 22 | 0.0309 | 0.0815 | 0.8866 | 0.9267 | 0.9062 | 0.9806 |
| 23 | 0.0310 | 0.0825 | 0.8854 | 0.9281 | 0.9062 | 0.9804 |
| 24 | 0.0280 | 0.0828 | 0.8874 | 0.9272 | 0.9068 | 0.9807 |
| 25 | 0.0271 | 0.0826 | 0.8884 | 0.9276 | 0.9076 | 0.9809 |
| 26 | 0.0290 | 0.0828 | 0.8887 | 0.9272 | 0.9075 | 0.9807 |
| 27 | 0.0318 | 0.0835 | 0.8855 | 0.9256 | 0.9051 | 0.9803 |
| 28 | 0.0287 | 0.0837 | 0.8871 | 0.9267 | 0.9065 | 0.9805 |
| 29 | 0.0274 | 0.0837 | 0.8855 | 0.9272 | 0.9058 | 0.9804 |
| 30 | 0.0271 | 0.0832 | 0.8875 | 0.9267 | 0.9067 | 0.9806 |
Test Set Evaluation
Evaluated on myanmar-ner-dataset test split using seqeval metrics:
| Entity | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| DATE | 0.80 | 0.86 | 0.83 | 251 |
| LOC | 0.93 | 0.96 | 0.95 | 2712 |
| NUM | 0.89 | 0.92 | 0.90 | 789 |
| ORG | 0.44 | 0.62 | 0.52 | 94 |
| PER | 0.84 | 0.88 | 0.86 | 533 |
| TIME | 0.62 | 0.70 | 0.66 | 57 |
| micro avg | 0.89 | 0.93 | 0.91 | 4436 |
| macro avg | 0.75 | 0.82 | 0.78 | 4436 |
| weighted avg | 0.89 | 0.93 | 0.91 | 4436 |
Training Details
| Parameter | Value |
|---|---|
| Base Model | chuuhtetnaing/myanmar-pos-model |
| Total Epochs | 30 |
| Total Steps | 510 |
| Best Checkpoint | checkpoint-255 |
| Best F1 | 0.9097 |
Usage
from transformers import pipeline
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", grouped_entities=True)
result = ner("ααα―αα±α¬ααΊαααΊαααΊαα―ααΊααΌαα―α·ααα―α·αα½α¬αΈαααΊα") # Ko Maung went to Yangon city
print(result)
Evaluation Code
!pip install seqeval
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from datasets import load_dataset
from tqdm import tqdm
from seqeval.metrics import classification_report
# Load model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("chuuhtetnaing/myanmar-ner-model")
tokenizer = AutoTokenizer.from_pretrained("chuuhtetnaing/myanmar-ner-model")
def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
labels = []
for i, label in enumerate(examples["ner_tags"]):
word_ids = tokenized_inputs.word_ids(batch_index=i)
previous_word_idx = None
label_ids = []
for word_idx in word_ids:
if word_idx is None:
label_ids.append(-100)
elif word_idx != previous_word_idx:
label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs
# Load and tokenize dataset
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", aggregation_strategy=None)
ds = load_dataset("chuuhtetnaing/myanmar-ner-dataset")
tokenized_ds = ds.map(tokenize_and_align_labels, batched=True)
test_ds = tokenized_ds["test"]
# Get label mapping
label_list = model.config.id2label
y_true = []
y_pred = []
for example in tqdm(test_ds):
tokens = tokenizer.convert_ids_to_tokens(example["input_ids"])
true_labels = [label_list[l] if l != -100 else "O" for l in example["labels"]]
text = tokenizer.decode(example["input_ids"], skip_special_tokens=True)
preds = ner(text)
pred_labels = ["O"] * len(true_labels)
for pred in preds:
idx = pred["index"]
if idx < len(pred_labels):
pred_labels[idx] = pred["entity"]
y_true.append([label_list[l] for l in example["labels"] if l != -100])
y_pred.append([p for p, l in zip(pred_labels, example["labels"]) if l != -100])
print(classification_report(y_true, y_pred))
NER Labels
| Tag | Description |
|---|---|
| B-DATE | Beginning of Date |
| I-DATE | Inside Date |
| B-LOC | Beginning of Location |
| I-LOC | Inside Location |
| B-NUM | Beginning of Number |
| I-NUM | Inside Number |
| B-ORG | Beginning of Organization |
| I-ORG | Inside Organization |
| B-PER | Beginning of Person |
| I-PER | Inside Person |
| B-TIME | Beginning of Time |
| I-TIME | Inside Time |
| O | Outside (Not an entity) |
- Downloads last month
- 31
Model tree for chuuhtetnaing/myanmar-ner-model
Base model
FacebookAI/xlm-roberta-base
Finetuned
chuuhtetnaing/myanmar-pos-model