File size: 7,027 Bytes
4f33681
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
license: apache-2.0
language:
- en
library_name: transformers.js
tags:
- code
- python
- maincoder
- code-generation
- reinforcement-learning
- mcpo
- onnx
pipeline_tag: text-generation
base_model: Maincode/Maincoder-1B
---
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250">

[**Maincoder-1B-ONNX**](https://maincode.com/maincoder/) is the ONNX-optimized version of [Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), a code-focused language model optimized for code generation and completion tasks. This version enables fast inference using ONNX Runtime in Python and runs directly in the browser with Transformers.js.

# Key Features

- **ONNX Optimized**: Efficient inference with ONNX Runtime and KV-cache support
- **Cross-Platform**: Run in Python, Node.js, or directly in the browser
- **Code Generation**: Optimized for Python code completion and generation tasks.
- **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware.
- **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.

# Benchmark Results

<img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050">

| Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K |
|---|---:|---:|---:|---:|---:|
| [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** |  **0.7090** | 0.3054 | 0.2976 |
| [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 |  0.6217 | 0.2705 | 0.0413 |
| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 |
| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 |
| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |  0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** |

# Model Overview

Maincoder uses a modern transformer decoder architecture with:

- **Rotary Position Embeddings**: With theta of 1,000,000.
- **RMSNorm**: Pre-normalization for stable training.
- **Grouped Query Attention**: 4:1 ratio of query to key-value heads.
- **QK Normalization**: RMSNorm applied to attention queries and keys.
- **SwiGLU MLP**: Gated linear units with SiLU activation.

| Attribute | Value |
|-----------|-------|
| Parameters | 1B |
| Hidden Size | 1536 |
| Layers | 32 |
| Attention Heads | 16 (4 KV heads) |
| Head Dimension | 96 |
| Vocabulary Size | 151,936 |
| Context Length | 2,048 |
| Format | ONNX |

# Usage

## Python (ONNX Runtime)

### Installation

```bash
pip install optimum[onnxruntime] transformers
```

For GPU acceleration:

```bash
pip install optimum[onnxruntime-gpu]
```

### Quick Start

```python
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

# Load the ONNX model with KV-cache support
model = ORTModelForCausalLM.from_pretrained(
    "Maincode/Maincoder-1B-ONNX",
    file_name="decoder_with_past_model.onnx",
    use_cache=True
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Maincode/Maincoder-1B-ONNX")

# Code completion example
prompt = '''def fibonacci(n: int) -> int:
    """Return the n-th Fibonacci number."""
'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.2,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### GPU Acceleration

```python
from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained(
    "Maincode/Maincoder-1B-ONNX",
    use_cache=True,
    file_name="decoder_with_past_model.onnx",
    provider="CUDAExecutionProvider"
)
```

---

## JavaScript (Transformers.js)

### Installation

```bash
npm install @huggingface/transformers
```

### Node.js

```javascript
import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers';

// Load the tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Maincode/Maincoder-1B-ONNX');
const model = await AutoModelForCausalLM.from_pretrained('Maincode/Maincoder-1B-ONNX', {
    subfolder: '.',
    model_file_name: 'decoder_with_past_model',
    use_external_data_format: true,

});

// Code completion example
const prompt = `def fibonacci(n: int) -> int:
    """Return the n-th Fibonacci number."""
`;

const inputs = await tokenizer(prompt, { return_tensors: 'pt' });

const outputs = await model.generate({
    input_ids: inputs.input_ids,
    attention_mask: inputs.attention_mask,
    max_new_tokens: 128,
    temperature: 0.2,
    do_sample: true,
});

const decoded = tokenizer.decode(outputs[0], { skip_special_tokens: true });
console.log(decoded);
```

---

## Code Completion Examples

```python
# Function completion
prompt = '''def quicksort(arr: list) -> list:
    """Sort a list using the quicksort algorithm."""
'''

# Class completion
prompt = '''class BinarySearchTree:
    """A binary search tree implementation."""
    
    def __init__(self):
'''

# Algorithm implementation
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
    """Find the shortest path using Dijkstra's algorithm.
    
    Args:
        graph: Adjacency list representation of the graph
        start: Starting node
        end: Target node
    
    Returns:
        Tuple of (distance, path)
    """
'''
```

# Additional Notes

## Limitations

- Context length limited to 2,048 tokens
- Primarily optimized for Python, performance may vary on other languages
- May generate code with bugs or security issues - always review generated code
- Browser performance depends on device capabilities

<div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;">
  <strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.
</div>

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

## Citation

```bibtex
@misc{maincoder2025,
  title        = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
  author       = {Maincode Team},
  year         = {2025},
  organization = {Maincode},
  howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
}
```

## Related Models

- [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) - Original PyTorch model

## Contact

For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com).