--- title: CoDA Fine-tuning emoji: 🚀 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: apache-2.0 hf_oauth: true hf_oauth_scopes: - read-repos - write-repos --- # CoDA Model Fine-tuning Space This Space allows you to fine-tune the **Salesforce/CoDA-v0-Instruct** text generation diffusion model on the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset. ## Features - 🎯 **Full Fine-tuning**: Complete parameter fine-tuning (not LoRA) - 💬 **ChatML Format**: Processes conversation data with question-answer pairs - 🔄 **Auto Upload**: Automatically uploads trained model to your Hugging Face account - 📊 **Progress Tracking**: Real-time training progress updates - 🔐 **OAuth Integration**: Secure authentication via Hugging Face login ## How to Use 1. **Login**: Click the "Sign in with Hugging Face" button 2. **Configure**: Adjust training parameters (epochs, batch size, learning rate) 3. **Train**: Click "Start Training" (requires GPU - upgrade Space to GPU tier) 4. **Resume**: If training is interrupted, check "Resume from last checkpoint" and restart 5. **Upload**: After training completes, click "Upload to Hugging Face Hub" ### Persistence This Space supports checkpoint persistence: - Training checkpoints are saved every 500 steps - If interrupted, you can resume from the last checkpoint - For Docker deployment: Mount `/data` volume for full persistence - On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier ## Requirements - **Hardware**: GPU (T4, A10G, or better) strongly recommended - **Account**: Hugging Face account with write permissions - **Time**: Training takes several hours depending on configuration ## About the Model **CoDA (Code Diffusion with Autoregressive)** is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data. ### Model Configuration ```json { "architectures": ["CoDALanguageModel"], "hidden_size": 2048, "num_hidden_layers": 28, "num_attention_heads": 16, "vocab_size": 151936, "max_position_embeddings": 40960 } ``` ## Dataset The training uses the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset: - **Format**: Conversational data in ChatML format - **Column**: `conversations` (list of role-content pairs) - **Split**: Uses `train` split with 90/10 train/eval split ## Training Details - **Optimizer**: AdamW - **Precision**: FP16 (on GPU) - **Gradient Accumulation**: 4 steps - **Gradient Checkpointing**: Enabled for memory efficiency - **Max Sequence Length**: 2048 tokens ## Citation If you use this Space or the CoDA model, please cite: ```bibtex @article{coda2023, title={CoDA: Bidirectional Code Diffusion}, author={Salesforce AI Research}, journal={arXiv preprint}, year={2023} } ``` ## License Apache 2.0