See axolotl config
axolotl version: 0.13.0.dev0
# 1. Base Model & Tokenizer
base_model: google/gemma-2-2b-it
model_type: AutoModelForCausalLM # Corrected from 'type_of_model' for axolotl
tokenizer_type: AutoTokenizer
hub_model_id: AiAF/gemma-2-2b-it-co-sft-qlora # New model ID for this finetune
hub_strategy: checkpoint
# 2. LoRA / QLoRA Configuration
load_in_4bit: true
adapter: qlora
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
# 3. Dataset Configuration
datasets:
- path: .
type: chat_template
# Use the data_files key for local files to avoid ambiguity
data_files: ./co-sft-dataset.jsonl
field_messages: conversations
message_property_mappings:
role: from
content: value
# Custom Jinja template for Gemma models
chat_template: jinja
chat_template_jinja: |
{{ bos_token }}
{% set last = None %}
{% for m in messages %}
{% set raw_role = 'model' if m['role']=='assistant' else m['role'] %}
{% set role = 'user' if raw_role=='system' else raw_role %}
{% if role == last and role == 'user' %}
{{ m['content'] | trim }}
{% else %}
{{ '<start_of_turn>' + role + '\n' + m['content'] | trim + '<end_of_turn>\n' }}
{% endif %}
{% set last = role %}
{% endfor %}
{% if add_generation_prompt %}
{{ '<start_of_turn>model\n' }}
{% endif %}
roles_to_train: ["assistant", "user"]
# 4. Training Parameters
sequence_len: 2048
sample_packing: true
eval_sample_packing: true
val_set_size: 0.05
num_epochs: 10
dataset_prepared_path: last_run_prepared
# 5. Saving and Evaluation Strategy
evals_per_epoch: 5
saves_per_epoch: 5
save_total_limit: 100
resume_from_checkpoint: outputs/sft/gemma-2-2b-it-co/checkpoint-15792/
# 6. Output & Logging
output_dir: ./outputs/sft/gemma-2-2b-it-co
wandb_project: "co-sft"
wandb_name: "gemma-2-2b-it_SFT-co_QLoRA"
wandb_log_model: "false"
wandb_run_id: "1"
# 7. Batching & Optimizer
gradient_accumulation_steps: 4
micro_batch_size: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
weight_decay: 0.0
# 8. Hardware & Performance
bf16: true
#fp16: true
tf32: true
flash_attention: true
gradient_checkpointing: true
logging_steps: 1
# 9. Special Tokens
eot_tokens: ["<end_of_turn>"]
special_tokens:
bos_token: "<bos>"
eos_token: "<eos>"
pad_token: "<pad>"
gemma-2-2b-it-co-sft-qlora
This model is a fine-tuned version of google/gemma-2-2b-it on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6665
- Memory/max Active (gib): 10.22
- Memory/max Allocated (gib): 10.22
- Memory/device Reserved (gib): 12.03
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 28170
Training results
| Training Loss | Epoch | Step | Validation Loss | Reserved (gib) | Active (gib) | Allocated (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 3.9014 | 8.66 | 7.61 | 7.61 |
| 2.2779 | 0.2002 | 564 | 2.2638 | 11.42 | 10.18 | 10.18 |
| 2.0814 | 0.4004 | 1128 | 2.0819 | 11.4 | 10.18 | 10.18 |
| 1.9261 | 0.6006 | 1692 | 1.9529 | 11.4 | 10.18 | 10.18 |
| 1.7837 | 0.8008 | 2256 | 1.8362 | 11.4 | 10.18 | 10.18 |
| 1.7039 | 1.0007 | 2820 | 1.7115 | 11.4 | 10.18 | 10.18 |
| 1.3581 | 1.2009 | 3384 | 1.6288 | 11.4 | 10.18 | 10.18 |
| 1.2775 | 1.4011 | 3948 | 1.5406 | 11.4 | 10.18 | 10.18 |
| 1.2031 | 1.6013 | 4512 | 1.4729 | 11.4 | 10.18 | 10.18 |
| 1.179 | 1.8015 | 5076 | 1.4379 | 11.4 | 10.18 | 10.18 |
| 1.1687 | 1.9996 | 5634 | 1.4310 | 8.82 | 7.77 | 7.77 |
| 1.1687 | 2.0018 | 5640 | 1.4628 | 11.42 | 10.18 | 10.18 |
| 1.1356 | 2.2020 | 6204 | 1.5042 | 11.4 | 10.18 | 10.18 |
| 1.1069 | 2.4022 | 6768 | 1.4440 | 11.4 | 10.18 | 10.18 |
| 1.1033 | 2.6024 | 7332 | 1.3911 | 11.4 | 10.18 | 10.18 |
| 1.0577 | 2.8026 | 7896 | 1.3202 | 11.4 | 10.18 | 10.18 |
| 1.0084 | 3.0025 | 8460 | 1.2964 | 11.4 | 10.18 | 10.18 |
| 0.7152 | 3.2027 | 9024 | 1.2804 | 11.4 | 10.18 | 10.18 |
| 0.7768 | 3.4029 | 9588 | 1.2555 | 11.4 | 10.18 | 10.18 |
| 0.7385 | 3.6031 | 10152 | 1.2414 | 11.4 | 10.18 | 10.18 |
| 0.7268 | 3.8033 | 10716 | 1.2337 | 11.4 | 10.18 | 10.18 |
| 0.742 | 3.9992 | 11268 | 1.2331 | 8.82 | 7.77 | 7.77 |
| 0.742 | 4.0032 | 11280 | 1.2636 | 11.42 | 10.18 | 10.18 |
| 0.9975 | 4.2034 | 11844 | 1.4157 | 11.4 | 10.18 | 10.18 |
| 1.0904 | 4.4036 | 12408 | 1.4253 | 11.4 | 10.18 | 10.18 |
| 1.088 | 4.6038 | 12972 | 1.3913 | 11.4 | 10.18 | 10.18 |
| 1.0622 | 4.8040 | 13536 | 1.3515 | 11.4 | 10.18 | 10.18 |
| 0.993 | 5.0043 | 14100 | 1.3557 | 11.4 | 7.78 | 7.78 |
| 0.8539 | 5.2045 | 14664 | 1.3281 | 11.4 | 10.18 | 10.18 |
| 0.8346 | 5.4046 | 15228 | 1.2908 | 11.4 | 10.18 | 10.18 |
| 0.8793 | 5.6048 | 15792 | 1.2460 | 11.4 | 10.18 | 10.18 |
| 0.8793 | 5.6048 | 15792 | 0.7040 | 7.79 | 7.79 | 8.84 |
| 0.7532 | 5.8062 | 16356 | 0.7194 | 10.22 | 10.22 | 12.26 |
| 0.7779 | 6.0064 | 16920 | 0.7192 | 10.22 | 10.22 | 12.03 |
| 0.6873 | 6.2066 | 17484 | 0.7190 | 10.22 | 10.22 | 12.03 |
| 0.6935 | 6.4068 | 18048 | 0.7096 | 10.22 | 10.22 | 12.03 |
| 0.6858 | 6.6070 | 18612 | 0.6968 | 10.22 | 10.22 | 12.03 |
| 0.6936 | 6.8072 | 19176 | 0.6823 | 10.22 | 10.22 | 12.03 |
| 0.6456 | 7.0075 | 19740 | 0.6739 | 10.22 | 10.22 | 12.03 |
| 0.5075 | 7.2077 | 20304 | 0.6760 | 10.22 | 10.22 | 12.03 |
| 0.5174 | 7.4079 | 20868 | 0.6690 | 10.22 | 10.22 | 12.03 |
| 0.5155 | 7.6081 | 21432 | 0.6554 | 10.22 | 10.22 | 12.03 |
| 0.4821 | 7.8083 | 21996 | 0.6472 | 10.22 | 10.22 | 12.03 |
| 0.477 | 8.0085 | 22560 | 0.6630 | 10.22 | 10.22 | 12.03 |
| 0.3981 | 8.2087 | 23124 | 0.6629 | 10.22 | 10.22 | 12.03 |
| 0.3917 | 8.4089 | 23688 | 0.6602 | 10.22 | 10.22 | 12.03 |
| 0.4008 | 8.6092 | 24252 | 0.6552 | 10.22 | 10.22 | 12.03 |
| 0.4003 | 8.8094 | 24816 | 0.6498 | 10.22 | 10.22 | 12.03 |
| 0.4102 | 9.0096 | 25380 | 0.6631 | 10.22 | 10.22 | 12.03 |
| 0.3526 | 9.2098 | 25944 | 0.6681 | 10.22 | 10.22 | 12.03 |
| 0.349 | 9.4100 | 26508 | 0.6664 | 10.22 | 10.22 | 12.03 |
| 0.3521 | 9.6102 | 27072 | 0.6669 | 10.22 | 10.22 | 12.03 |
| 0.3424 | 9.8104 | 27636 | 0.6665 | 10.22 | 10.22 | 12.03 |
Framework versions
- PEFT 0.17.1
- Transformers 4.56.1
- Pytorch 2.7.1+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 1