File size: 7,027 Bytes

4f33681

---
license: apache-2.0
language:
- en
library_name: transformers.js
tags:
- code
- python
- maincoder
- code-generation
- reinforcement-learning
- mcpo
- onnx
pipeline_tag: text-generation
base_model: Maincode/Maincoder-1B
---
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250">

[**Maincoder-1B-ONNX**](https://maincode.com/maincoder/) is the ONNX-optimized version of [Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), a code-focused language model optimized for code generation and completion tasks. This version enables fast inference using ONNX Runtime in Python and runs directly in the browser with Transformers.js.

# Key Features

- **ONNX Optimized**: Efficient inference with ONNX Runtime and KV-cache support
- **Cross-Platform**: Run in Python, Node.js, or directly in the browser
- **Code Generation**: Optimized for Python code completion and generation tasks.
- **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware.
- **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.

# Benchmark Results

<img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050">

| Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K |
|---|---:|---:|---:|---:|---:|
| [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** |  **0.7090** | 0.3054 | 0.2976 |
| [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 |  0.6217 | 0.2705 | 0.0413 |
| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 |
| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 |
| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |  0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** |

# Model Overview

Maincoder uses a modern transformer decoder architecture with:

- **Rotary Position Embeddings**: With theta of 1,000,000.
- **RMSNorm**: Pre-normalization for stable training.
- **Grouped Query Attention**: 4:1 ratio of query to key-value heads.
- **QK Normalization**: RMSNorm applied to attention queries and keys.
- **SwiGLU MLP**: Gated linear units with SiLU activation.

| Attribute | Value |
|-----------|-------|
| Parameters | 1B |
| Hidden Size | 1536 |
| Layers | 32 |
| Attention Heads | 16 (4 KV heads) |
| Head Dimension | 96 |
| Vocabulary Size | 151,936 |
| Context Length | 2,048 |
| Format | ONNX |

# Usage

## Python (ONNX Runtime)

### Installation

```bash
pip install optimum[onnxruntime] transformers
```

For GPU acceleration:

```bash
pip install optimum[onnxruntime-gpu]
```

### Quick Start

```python
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

# Load the ONNX model with KV-cache support
model = ORTModelForCausalLM.from_pretrained(
    "Maincode/Maincoder-1B-ONNX",
    file_name="decoder_with_past_model.onnx",
    use_cache=True
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Maincode/Maincoder-1B-ONNX")

# Code completion example
prompt = '''def fibonacci(n: int) -> int:
    """Return the n-th Fibonacci number."""
'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.2,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### GPU Acceleration

```python
from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained(
    "Maincode/Maincoder-1B-ONNX",
    use_cache=True,
    file_name="decoder_with_past_model.onnx",
    provider="CUDAExecutionProvider"
)
```

---

## JavaScript (Transformers.js)

### Installation

```bash
npm install @huggingface/transformers
```

### Node.js

```javascript
import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers';

// Load the tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Maincode/Maincoder-1B-ONNX');
const model = await AutoModelForCausalLM.from_pretrained('Maincode/Maincoder-1B-ONNX', {
    subfolder: '.',
    model_file_name: 'decoder_with_past_model',
    use_external_data_format: true,

});

// Code completion example
const prompt = `def fibonacci(n: int) -> int:
    """Return the n-th Fibonacci number."""
`;

const inputs = await tokenizer(prompt, { return_tensors: 'pt' });

const outputs = await model.generate({
    input_ids: inputs.input_ids,
    attention_mask: inputs.attention_mask,
    max_new_tokens: 128,
    temperature: 0.2,
    do_sample: true,
});

const decoded = tokenizer.decode(outputs[0], { skip_special_tokens: true });
console.log(decoded);
```

---

## Code Completion Examples

```python
# Function completion
prompt = '''def quicksort(arr: list) -> list:
    """Sort a list using the quicksort algorithm."""
'''

# Class completion
prompt = '''class BinarySearchTree:
    """A binary search tree implementation."""
    
    def __init__(self):
'''

# Algorithm implementation
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
    """Find the shortest path using Dijkstra's algorithm.
    
    Args:
        graph: Adjacency list representation of the graph
        start: Starting node
        end: Target node
    
    Returns:
        Tuple of (distance, path)
    """
'''
```

# Additional Notes

## Limitations

- Context length limited to 2,048 tokens
- Primarily optimized for Python, performance may vary on other languages
- May generate code with bugs or security issues - always review generated code
- Browser performance depends on device capabilities

<div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;">
  <strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.
</div>

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

## Citation

```bibtex
@misc{maincoder2025,
  title        = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
  author       = {Maincode Team},
  year         = {2025},
  organization = {Maincode},
  howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
}
```

## Related Models

- [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) - Original PyTorch model

## Contact

For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com).