@melvindave on Hugging Face: "Currently having a blast learning the transformers library. I noticed that…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

melvindave

posted an update 1 day ago

Post

1451

Currently having a blast learning the transformers library.

I noticed that model cards usually have Transformers code as usage examples.

So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.

Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.

Once I got the model loaded and sample inference working, I made an API to serve it.

I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!

Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090

Here's the result of my experimentation

telcom

1 day ago

great step forward, try to do some training afterwards to fine tune it. It's a good step forward.

melvindave

about 23 hours ago

•

edited about 23 hours ago

thank you. what’s the best way to start fine-tuning?

mrdbourke

about 21 hours ago

Epic effort Melvin! Getting setup locally is often quite the challenge! Looks like your model worked pretty well too!

melvindave

about 21 hours ago

thank you. i agree. since i have my gpu in windows, that took time to setup too. yeah it’s a small model which works locally. trying to do more tests

JLouisBiz

about 20 hours ago

Congratulation. Publish the script on how you run it for others to see.

Here is exactly how I run it:

/usr/local/bin/llama-server --jinja -fa on -c 32768 -ngl 64 -v --log-timestamps --host 192.168.1.68 -m /mnt/nvme0n1/LLM/quantized/Qwen3VL-8B-Instruct-Q8_0.gguf --mmproj /mnt/nvme0n1/LLM/quantized/mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf

with the llama.cpp and API is of course available as well.

melvindave

about 19 hours ago

yeah I tried llama.cpp. was curious how to running the model from transformers code. i also tried llama-cpp-python which can do an inference of the model from your own code

what is this flag for? --mmproj

In this post