kvaishnavi commited on
Commit
3d8799e
·
1 Parent(s): ba190fa

Upload Fara-7B ONNX models

Browse files
README.md CHANGED
@@ -1,5 +1,46 @@
1
  ---
 
 
 
 
 
 
2
  license: mit
3
- base_model:
4
- - microsoft/Fara-7B
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - ONNX
4
+ - ONNX Runtime
5
+ - code
6
+ - nlp
7
+ - multimodal
8
  license: mit
9
+ language: en
10
+ pipeline_tag: image-text-to-text
11
+ ---
12
+ # Fara-7B ONNX models
13
+
14
+ ## Introduction
15
+ This repository hosts the optimized versions of the Fara-7B models to accelerate inference with ONNX Runtime.
16
+
17
+ Optimized models are published here in ONNX format to run with ONNX Runtime on NPU.
18
+
19
+ Here are some of the optimized configurations we have added:
20
+
21
+ 1. ONNX model for int4 NPU: ONNX model for Qualcomm NPU using int4 quantization.
22
+
23
+ ## Model Run
24
+ You can see how to run this model with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-vision.py)
25
+
26
+ For NPU:
27
+
28
+ ```bash
29
+ # Download the model directly using the Hugging Face CLI
30
+ huggingface-cli download microsoft/Fara-7B-onnx --include npu/qnn-int4/* --local-dir .
31
+
32
+ # Install ONNX Runtime GenAI
33
+ pip install --pre onnxruntime-genai
34
+
35
+ # Please adjust the model directory (-m) accordingly
36
+ curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
37
+ python model-vision.py -m npu/qnn-int4 --use-winml
38
+ ```
39
+
40
+ ## Model Description
41
+ - Developed by: Microsoft
42
+ - Model type: ONNX
43
+ - License: MIT
44
+ - Model Description: This is a conversion of the Fara-7B model for ONNX Runtime inference.
45
+
46
+ **Disclaimer:** Model is only an optimization of the base model. Any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.
npu/qnn-int4/LICENSE ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Microsoft.
2
+ Copyright (c) Microsoft Corporation.
3
+
4
+ MIT License
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
npu/qnn-int4/chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
npu/qnn-int4/genai_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:056d55f77d74ec1bfae789df98a3d0e101b62f87068c86e6412bef6883c2b6ca
3
+ size 8769
npu/qnn-int4/processor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb046cd8384fc5daba0012b1494a3345356a4b3c35dcd3a246fb9663e365336e
3
+ size 1459
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_1_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dcbb2bbed51c047491650a19b810dbefb73995054bd4547af419732792c4c158
3
+ size 949673984
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_2_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdf7e4ae4382b2f8882c837364fa78850f65480905a59d75998e9d5e2b4fe113
3
+ size 949673984
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_3_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60eab7a87c4ac9b80c0d77c22873103d0da06d5e9646db83986956e2a7e96c95
3
+ size 949665792
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_4_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75b2d6bf8614eb39e92babc2b21cb9de86f3f9ab43a499a7d998e8ace820c678
3
+ size 474857472
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_ctx.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c4b3f8063812ea28da95e30e2e65f51a589c8eab5261f71fa93ad421dadfec6
3
+ size 65583519
npu/qnn-int4/qwen2_5_vl_webcua_embeddings_w4a32.quant.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17e9f756271089b5af067f934543a3dcf6b6f510426265743fd5ed27ae5fda5c
3
+ size 349139384
npu/qnn-int4/qwen2_5_vl_webcua_itr_1_ctx.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd6439288d6632881502fbd4a909153e3f9e7b2f34f1ff3f9d245999deb80b4f
3
+ size 65583364
npu/qnn-int4/qwen2_5_vl_webcua_lm_head_w4a32.quant.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55d5283bd599ebc6fb742fa14e264e73c8106a2e5183140a84ecb562fbd9b0d4
3
+ size 349139441
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_1_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08b140ce4c3665e02070a363df8eaebfd1b0ab444e237519762df6713627d177
3
+ size 440930304
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_2_qnn.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:714bc39748b21baa216ef0468b7230109df01ea6b46325c3e384c5b9e9887b57
3
+ size 367792128
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_ctx.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f3c4d6fdcf58ced69c6630be7606e3d350461a45321ee09cada4706b06105c6
3
+ size 1469
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_embed.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ebc2de24de3e87be8c4aa148f1ca59e7bafcc6afeed2c0bcf35ecc40909eb20
3
+ size 6021909
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_merger.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:024c5792c6eab12c5ec3ddd0566b1e1ca0229c484f26c50e6ca87807232df45f
3
+ size 178299575
npu/qnn-int4/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
npu/qnn-int4/tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a04a9d7d4a62b28482bdfe726c122756de85714fb64166ace92ae75b8f57614
3
+ size 4686