File size: 3,095 Bytes
07a2da6
6d15327
 
 
 
 
 
 
07a2da6
6d15327
 
 
 
 
07a2da6
 
6d15327
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
title: CoDA Fine-tuning
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
hf_oauth: true
hf_oauth_scopes:
  - read-repos
  - write-repos
---

# CoDA Model Fine-tuning Space

This Space allows you to fine-tune the **Salesforce/CoDA-v0-Instruct** text generation diffusion model on the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset.

## Features

- 🎯 **Full Fine-tuning**: Complete parameter fine-tuning (not LoRA)
- πŸ’¬ **ChatML Format**: Processes conversation data with question-answer pairs
- πŸ”„ **Auto Upload**: Automatically uploads trained model to your Hugging Face account
- πŸ“Š **Progress Tracking**: Real-time training progress updates
- πŸ” **OAuth Integration**: Secure authentication via Hugging Face login

## How to Use

1. **Login**: Click the "Sign in with Hugging Face" button
2. **Configure**: Adjust training parameters (epochs, batch size, learning rate)
3. **Train**: Click "Start Training" (requires GPU - upgrade Space to GPU tier)
4. **Resume**: If training is interrupted, check "Resume from last checkpoint" and restart
5. **Upload**: After training completes, click "Upload to Hugging Face Hub"

### Persistence

This Space supports checkpoint persistence:
- Training checkpoints are saved every 500 steps
- If interrupted, you can resume from the last checkpoint
- For Docker deployment: Mount `/data` volume for full persistence
- On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier

## Requirements

- **Hardware**: GPU (T4, A10G, or better) strongly recommended
- **Account**: Hugging Face account with write permissions
- **Time**: Training takes several hours depending on configuration

## About the Model

**CoDA (Code Diffusion with Autoregressive)** is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data.

### Model Configuration

```json
{
  "architectures": ["CoDALanguageModel"],
  "hidden_size": 2048,
  "num_hidden_layers": 28,
  "num_attention_heads": 16,
  "vocab_size": 151936,
  "max_position_embeddings": 40960
}
```

## Dataset

The training uses the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset:
- **Format**: Conversational data in ChatML format
- **Column**: `conversations` (list of role-content pairs)
- **Split**: Uses `train` split with 90/10 train/eval split

## Training Details

- **Optimizer**: AdamW
- **Precision**: FP16 (on GPU)
- **Gradient Accumulation**: 4 steps
- **Gradient Checkpointing**: Enabled for memory efficiency
- **Max Sequence Length**: 2048 tokens

## Citation

If you use this Space or the CoDA model, please cite:

```bibtex
@article{coda2023,
  title={CoDA: Bidirectional Code Diffusion},
  author={Salesforce AI Research},
  journal={arXiv preprint},
  year={2023}
}
```

## License

Apache 2.0