kvaishnavi commited on 8 days ago

Commit

3d8799e

1 Parent(s): ba190fa

Upload Fara-7B ONNX models

Browse files

Files changed (20) hide show

README.md +44 -3
npu/qnn-int4/LICENSE +22 -0
npu/qnn-int4/chat_template.jinja +7 -0
npu/qnn-int4/genai_config.json +3 -0
npu/qnn-int4/processor_config.json +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_1_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_2_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_3_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_4_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_ctx.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_embeddings_w4a32.quant.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_itr_1_ctx.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_lm_head_w4a32.quant.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_1_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_2_qnn.bin +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_ctx.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_embed.onnx +3 -0
npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_merger.onnx +3 -0
npu/qnn-int4/tokenizer.json +3 -0
npu/qnn-int4/tokenizer_config.json +3 -0

README.md CHANGED Viewed

@@ -1,5 +1,46 @@
 ---
 license: mit
-base_model:
-- microsoft/Fara-7B
----

 ---
+tags:
+- ONNX
+- ONNX Runtime
+- code
+- nlp
+- multimodal
 license: mit
+language: en
+pipeline_tag: image-text-to-text
+---
+# Fara-7B ONNX models
+## Introduction
+This repository hosts the optimized versions of the Fara-7B models to accelerate inference with ONNX Runtime.
+Optimized models are published here in ONNX format to run with ONNX Runtime on NPU.
+Here are some of the optimized configurations we have added:
+1. ONNX model for int4 NPU: ONNX model for Qualcomm NPU using int4 quantization.
+## Model Run
+You can see how to run this model with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-vision.py)
+For NPU:
+```bash
+# Download the model directly using the Hugging Face CLI
+huggingface-cli download microsoft/Fara-7B-onnx --include npu/qnn-int4/* --local-dir .
+# Install ONNX Runtime GenAI
+pip install --pre onnxruntime-genai
+# Please adjust the model directory (-m) accordingly
+curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
+python model-vision.py -m npu/qnn-int4 --use-winml
+```
+## Model Description
+- Developed by: Microsoft
+- Model type: ONNX
+- License: MIT
+- Model Description: This is a conversion of the Fara-7B model for ONNX Runtime inference.
+**Disclaimer:** Model is only an optimization of the base model. Any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied.

npu/qnn-int4/LICENSE ADDED Viewed

	@@ -0,0 +1,22 @@

+Microsoft.
+Copyright (c) Microsoft Corporation.
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

npu/qnn-int4/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

npu/qnn-int4/genai_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:056d55f77d74ec1bfae789df98a3d0e101b62f87068c86e6412bef6883c2b6ca
+size 8769

npu/qnn-int4/processor_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb046cd8384fc5daba0012b1494a3345356a4b3c35dcd3a246fb9663e365336e
+size 1459

npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_1_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcbb2bbed51c047491650a19b810dbefb73995054bd4547af419732792c4c158
+size 949673984

npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_2_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdf7e4ae4382b2f8882c837364fa78850f65480905a59d75998e9d5e2b4fe113
+size 949673984

npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_3_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60eab7a87c4ac9b80c0d77c22873103d0da06d5e9646db83986956e2a7e96c95
+size 949665792

npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_4_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75b2d6bf8614eb39e92babc2b21cb9de86f3f9ab43a499a7d998e8ace820c678
+size 474857472

npu/qnn-int4/qwen2_5_vl_webcua_ctx_512_ctx.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c4b3f8063812ea28da95e30e2e65f51a589c8eab5261f71fa93ad421dadfec6
+size 65583519

npu/qnn-int4/qwen2_5_vl_webcua_embeddings_w4a32.quant.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17e9f756271089b5af067f934543a3dcf6b6f510426265743fd5ed27ae5fda5c
+size 349139384

npu/qnn-int4/qwen2_5_vl_webcua_itr_1_ctx.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cd6439288d6632881502fbd4a909153e3f9e7b2f34f1ff3f9d245999deb80b4f
+size 65583364

npu/qnn-int4/qwen2_5_vl_webcua_lm_head_w4a32.quant.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:55d5283bd599ebc6fb742fa14e264e73c8106a2e5183140a84ecb562fbd9b0d4
+size 349139441

npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_1_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08b140ce4c3665e02070a363df8eaebfd1b0ab444e237519762df6713627d177
+size 440930304

npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_2_qnn.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:714bc39748b21baa216ef0468b7230109df01ea6b46325c3e384c5b9e9887b57
+size 367792128

npu/qnn-int4/qwen2_5_vl_webcua_visual_attn_block_ctx.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f3c4d6fdcf58ced69c6630be7606e3d350461a45321ee09cada4706b06105c6
+size 1469

npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_embed.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ebc2de24de3e87be8c4aa148f1ca59e7bafcc6afeed2c0bcf35ecc40909eb20
+size 6021909

npu/qnn-int4/qwen2_5_vl_webcua_visual_patch_merger.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:024c5792c6eab12c5ec3ddd0566b1e1ca0229c484f26c50e6ca87807232df45f
+size 178299575

npu/qnn-int4/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

npu/qnn-int4/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a04a9d7d4a62b28482bdfe726c122756de85714fb64166ace92ae75b8f57614
+size 4686