Commit
·
3d8799e
1
Parent(s):
ba190fa
Upload Fara-7B ONNX models
Browse files- README.md +44 -3
- npu/qnn-int4/LICENSE +22 -0
- npu/qnn-int4/chat_template.jinja +7 -0
- npu/qnn-int4/genai_config.json +3 -0
- npu/qnn-int4/processor_config.json +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_1_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_2_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_3_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_4_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_ctx.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_embeddings_w4a32.quant.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_itr_1_ctx.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_lm_head_w4a32.quant.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_1_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_2_qnn.bin +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_ctx.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_embed.onnx +3 -0
- npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_merger.onnx +3 -0
- npu/qnn-int4/tokenizer.json +3 -0
- npu/qnn-int4/tokenizer_config.json +3 -0
README.md
CHANGED
|
@@ -1,5 +1,46 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
tags:
|
| 3 |
+
- ONNX
|
| 4 |
+
- ONNX Runtime
|
| 5 |
+
- code
|
| 6 |
+
- nlp
|
| 7 |
+
- multimodal
|
| 8 |
license: mit
|
| 9 |
+
language: en
|
| 10 |
+
pipeline_tag: image-text-to-text
|
| 11 |
+
---
|
| 12 |
+
# Fara-7B ONNX models
|
| 13 |
+
|
| 14 |
+
## Introduction
|
| 15 |
+
This repository hosts the optimized versions of the Fara-7B models to accelerate inference with ONNX Runtime.
|
| 16 |
+
|
| 17 |
+
Optimized models are published here in ONNX format to run with ONNX Runtime on NPU.
|
| 18 |
+
|
| 19 |
+
Here are some of the optimized configurations we have added:
|
| 20 |
+
|
| 21 |
+
1. ONNX model for int4 NPU: ONNX model for Qualcomm NPU using int4 quantization.
|
| 22 |
+
|
| 23 |
+
## Model Run
|
| 24 |
+
You can see how to run this model with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-vision.py)
|
| 25 |
+
|
| 26 |
+
For NPU:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
# Download the model directly using the Hugging Face CLI
|
| 30 |
+
huggingface-cli download microsoft/Fara-7B-onnx --include npu/qnn-int4/* --local-dir .
|
| 31 |
+
|
| 32 |
+
# Install ONNX Runtime GenAI
|
| 33 |
+
pip install --pre onnxruntime-genai
|
| 34 |
+
|
| 35 |
+
# Please adjust the model directory (-m) accordingly
|
| 36 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
|
| 37 |
+
python model-vision.py -m npu/qnn-int4 --use-winml
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Model Description
|
| 41 |
+
- Developed by: Microsoft
|
| 42 |
+
- Model type: ONNX
|
| 43 |
+
- License: MIT
|
| 44 |
+
- Model Description: This is a conversion of the Fara-7B model for ONNX Runtime inference.
|
| 45 |
+
|
| 46 |
+
**Disclaimer:** Model is only an optimization of the base model. Any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.
|
npu/qnn-int4/LICENSE
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Microsoft.
|
| 2 |
+
Copyright (c) Microsoft Corporation.
|
| 3 |
+
|
| 4 |
+
MIT License
|
| 5 |
+
|
| 6 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 7 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 8 |
+
in the Software without restriction, including without limitation the rights
|
| 9 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 10 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 11 |
+
furnished to do so, subject to the following conditions:
|
| 12 |
+
|
| 13 |
+
The above copyright notice and this permission notice shall be included in all
|
| 14 |
+
copies or substantial portions of the Software.
|
| 15 |
+
|
| 16 |
+
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 17 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 18 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 19 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 20 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 21 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 22 |
+
SOFTWARE.
|
npu/qnn-int4/chat_template.jinja
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
|
| 2 |
+
You are a helpful assistant.<|im_end|>
|
| 3 |
+
{% endif %}<|im_start|>{{ message['role'] }}
|
| 4 |
+
{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
|
| 5 |
+
{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
|
| 6 |
+
{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
|
| 7 |
+
{% endif %}
|
npu/qnn-int4/genai_config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:056d55f77d74ec1bfae789df98a3d0e101b62f87068c86e6412bef6883c2b6ca
|
| 3 |
+
size 8769
|
npu/qnn-int4/processor_config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb046cd8384fc5daba0012b1494a3345356a4b3c35dcd3a246fb9663e365336e
|
| 3 |
+
size 1459
|
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_1_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dcbb2bbed51c047491650a19b810dbefb73995054bd4547af419732792c4c158
|
| 3 |
+
size 949673984
|
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_2_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fdf7e4ae4382b2f8882c837364fa78850f65480905a59d75998e9d5e2b4fe113
|
| 3 |
+
size 949673984
|
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_3_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:60eab7a87c4ac9b80c0d77c22873103d0da06d5e9646db83986956e2a7e96c95
|
| 3 |
+
size 949665792
|
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_4_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:75b2d6bf8614eb39e92babc2b21cb9de86f3f9ab43a499a7d998e8ace820c678
|
| 3 |
+
size 474857472
|
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_ctx.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0c4b3f8063812ea28da95e30e2e65f51a589c8eab5261f71fa93ad421dadfec6
|
| 3 |
+
size 65583519
|
npu/qnn-int4/qwen2_5_vl_webcua_embeddings_w4a32.quant.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:17e9f756271089b5af067f934543a3dcf6b6f510426265743fd5ed27ae5fda5c
|
| 3 |
+
size 349139384
|
npu/qnn-int4/qwen2_5_vl_webcua_itr_1_ctx.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd6439288d6632881502fbd4a909153e3f9e7b2f34f1ff3f9d245999deb80b4f
|
| 3 |
+
size 65583364
|
npu/qnn-int4/qwen2_5_vl_webcua_lm_head_w4a32.quant.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:55d5283bd599ebc6fb742fa14e264e73c8106a2e5183140a84ecb562fbd9b0d4
|
| 3 |
+
size 349139441
|
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_1_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:08b140ce4c3665e02070a363df8eaebfd1b0ab444e237519762df6713627d177
|
| 3 |
+
size 440930304
|
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_2_qnn.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:714bc39748b21baa216ef0468b7230109df01ea6b46325c3e384c5b9e9887b57
|
| 3 |
+
size 367792128
|
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_ctx.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8f3c4d6fdcf58ced69c6630be7606e3d350461a45321ee09cada4706b06105c6
|
| 3 |
+
size 1469
|
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_embed.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8ebc2de24de3e87be8c4aa148f1ca59e7bafcc6afeed2c0bcf35ecc40909eb20
|
| 3 |
+
size 6021909
|
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_merger.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:024c5792c6eab12c5ec3ddd0566b1e1ca0229c484f26c50e6ca87807232df45f
|
| 3 |
+
size 178299575
|
npu/qnn-int4/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
| 3 |
+
size 11421896
|
npu/qnn-int4/tokenizer_config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0a04a9d7d4a62b28482bdfe726c122756de85714fb64166ace92ae75b8f57614
|
| 3 |
+
size 4686
|