|
|
--- |
|
|
title: CoDA Fine-tuning |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
hf_oauth: true |
|
|
hf_oauth_scopes: |
|
|
- read-repos |
|
|
- write-repos |
|
|
--- |
|
|
|
|
|
# CoDA Model Fine-tuning Space |
|
|
|
|
|
This Space allows you to fine-tune the **Salesforce/CoDA-v0-Instruct** text generation diffusion model on the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset. |
|
|
|
|
|
## Features |
|
|
|
|
|
- π― **Full Fine-tuning**: Complete parameter fine-tuning (not LoRA) |
|
|
- π¬ **ChatML Format**: Processes conversation data with question-answer pairs |
|
|
- π **Auto Upload**: Automatically uploads trained model to your Hugging Face account |
|
|
- π **Progress Tracking**: Real-time training progress updates |
|
|
- π **OAuth Integration**: Secure authentication via Hugging Face login |
|
|
|
|
|
## How to Use |
|
|
|
|
|
1. **Login**: Click the "Sign in with Hugging Face" button |
|
|
2. **Configure**: Adjust training parameters (epochs, batch size, learning rate) |
|
|
3. **Train**: Click "Start Training" (requires GPU - upgrade Space to GPU tier) |
|
|
4. **Resume**: If training is interrupted, check "Resume from last checkpoint" and restart |
|
|
5. **Upload**: After training completes, click "Upload to Hugging Face Hub" |
|
|
|
|
|
### Persistence |
|
|
|
|
|
This Space supports checkpoint persistence: |
|
|
- Training checkpoints are saved every 500 steps |
|
|
- If interrupted, you can resume from the last checkpoint |
|
|
- For Docker deployment: Mount `/data` volume for full persistence |
|
|
- On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- **Hardware**: GPU (T4, A10G, or better) strongly recommended |
|
|
- **Account**: Hugging Face account with write permissions |
|
|
- **Time**: Training takes several hours depending on configuration |
|
|
|
|
|
## About the Model |
|
|
|
|
|
**CoDA (Code Diffusion with Autoregressive)** is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data. |
|
|
|
|
|
### Model Configuration |
|
|
|
|
|
```json |
|
|
{ |
|
|
"architectures": ["CoDALanguageModel"], |
|
|
"hidden_size": 2048, |
|
|
"num_hidden_layers": 28, |
|
|
"num_attention_heads": 16, |
|
|
"vocab_size": 151936, |
|
|
"max_position_embeddings": 40960 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Dataset |
|
|
|
|
|
The training uses the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset: |
|
|
- **Format**: Conversational data in ChatML format |
|
|
- **Column**: `conversations` (list of role-content pairs) |
|
|
- **Split**: Uses `train` split with 90/10 train/eval split |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Optimizer**: AdamW |
|
|
- **Precision**: FP16 (on GPU) |
|
|
- **Gradient Accumulation**: 4 steps |
|
|
- **Gradient Checkpointing**: Enabled for memory efficiency |
|
|
- **Max Sequence Length**: 2048 tokens |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this Space or the CoDA model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{coda2023, |
|
|
title={CoDA: Bidirectional Code Diffusion}, |
|
|
author={Salesforce AI Research}, |
|
|
journal={arXiv preprint}, |
|
|
year={2023} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|