Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
melvindave 
posted an update 1 day ago
Post
1451
Currently having a blast learning the transformers library.

I noticed that model cards usually have Transformers code as usage examples.

So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.

Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.

Once I got the model loaded and sample inference working, I made an API to serve it.

I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!

Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090

Here's the result of my experimentation

great step forward, try to do some training afterwards to fine tune it. It's a good step forward.

·

thank you. what’s the best way to start fine-tuning?

Epic effort Melvin! Getting setup locally is often quite the challenge! Looks like your model worked pretty well too!

·

thank you. i agree. since i have my gpu in windows, that took time to setup too. yeah it’s a small model which works locally. trying to do more tests

Congratulation. Publish the script on how you run it for others to see.

Here is exactly how I run it:

/usr/local/bin/llama-server --jinja -fa on -c 32768 -ngl 64 -v --log-timestamps --host 192.168.1.68 -m /mnt/nvme0n1/LLM/quantized/Qwen3VL-8B-Instruct-Q8_0.gguf --mmproj /mnt/nvme0n1/LLM/quantized/mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf

with the llama.cpp and API is of course available as well.

·

yeah I tried llama.cpp. was curious how to running the model from transformers code. i also tried llama-cpp-python which can do an inference of the model from your own code

what is this flag for? --mmproj