In a Training Loop 🔄

526 2265 27074

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

reacted to sergiopaniego's post with 🤗 about 1 hour ago

We just released TRL v0.26.0! It comes packed with updates: > Agent training with tools in GRPO > New CISPO & SAPO losses + reasoning rewards > vLLM quantization in colocate mode > Dataset shuffling in SFT > Lots of NEW examples > Tons of fixes and documentation improvements

updated a dataset about 16 hours ago

John6666/forum3

updated a dataset about 16 hours ago

John6666/knowledge_base_md_for_rag_1

View all activity

Organizations

reacted to sergiopaniego's post with 🤗 about 1 hour ago

Post

343

We just released TRL v0.26.0!

It comes packed with updates:
> Agent training with tools in GRPO
> New CISPO & SAPO losses + reasoning rewards
> vLLM quantization in colocate mode
> Dataset shuffling in SFT
> Lots of NEW examples
> Tons of fixes and documentation improvements

3 replies

updated 2 datasets about 16 hours ago

John6666/forum3

Updated about 16 hours ago • 63 • 1

John6666/knowledge_base_md_for_rag_1

Updated about 16 hours ago • 168

liked 3 models about 16 hours ago

reacted to NicoBBQ1's post with 👀 about 16 hours ago

Post

707

What do you think of my LLM Chat app so far?
Here are some of the features already included (and more are coming):

- Chat with AI models – Local inference via Ollama
- Reasoning support – View model thinking process (DeepSeek-R1, Qwen-QwQ, etc.)
- Vision models – Analyze images with llava, bakllava, moondream
- Image generation – Local GGUF models with GPU acceleration (CUDA)
- Fullscreen images – Click generated images to view in fullscreen
- Image attachments – File picker or clipboard paste (Ctrl+V)
- DeepSearch – Web search with tool use
- Inference Stats – Token counts, speed, duration (like Ollama verbose)
- Regenerate – Re-run any AI response
- Copy – One-click copy AI responses

4 replies

upvoted a collection about 16 hours ago

Shisa V2.1

Collection

A family of bilingual JA/EN LLMs. https://shisa.ai/posts/shisa-v2.1/ • 5 items • Updated about 21 hours ago • 1

reacted to leonardlin's post with 🔥 about 16 hours ago

Post

564

We just released our latest Shisa V2.1 Japanese multi-lingual models: https://huggingface.co/collections/shisa-ai/shisa-v21

Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.

Per usual, lots of details in the Model Cards for those interested.

1 reply

upvoted an article about 16 hours ago

Article

I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago"

1 day ago

•

reacted to RakshitAralimatti's post with 🔥 about 16 hours ago

Post

1684

I built something crazy you never saw before.

Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag

A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.

Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

1 reply

reacted to StJohnDeakins's post with 🚀 about 16 hours ago

Post

1134

Hey all 👋

A Quick one for any founders building with Small Language Models in mobile apps: We’re opening 10 Innovation Partner spots this month for our Device Native AI (DNA) platform.

What you get:
- Device Native AI SDK ⁠(AI processes data on-device, not cloud 📲)
- 99% off for 3 months, then 90% off for the rest of the year (no lock-in)
- Direct engineering access + feature releases
- It's an Innovation community, so at least some participation is required

Perfect if you're building consumer apps and want:
✓ Hyper-personalization without privacy risks
✓ Zero cloud AI token costs
✓ Early access to next-gen mobile AI

Limited spots, and on a first-come basis, so DM me "DNA" for more info and an access code. Cheers Singe 🐵

2 replies

reacted to Teen-Different's post with 👀 about 16 hours ago

Post

793

Interesting... looked into Apple's DiffuCoder and the masked diffusion approach is actually hitting SOTA parity... basicallly proving global MDLM can work for code https://arxiv.org/pdf/2506.20639

but then you look at Tiny-A2D results and it’s the complete opposite...BD3LM (block diffusion) totally outperforms MDLM... and then both MDLM and BD3LM models struggle hard compared to the AR baselines... https://github.com/ZHZisZZ/dllm/tree/main/examples/a2d

digging into the why and i think it comes down to the adaptation method....tiny-A2D just SFT’d an AR model adaption to force it into diffusion... asking a model wired for left to right causal attention to suddenly think bidirectionally is a massive shock... it struggles to unlearn that strong AR inductive bias

...that explains why BD3LM worked better in their case... since it generates in chunks it preserves some sequential order... acts like a bridge or crutch that feels more natural to the original Qwen weights

contrast that with Apple... they didn't just SFT...they pre-trained/adapted on 130B tokens... fundamentally rewiring the model to understand global dependencies from the ground up

my theory is if we want MDLM to actually work we can’t just SFT... we need that heavy adaptation or full pre-training phase to break the causal priors... otherwise the model just gets confused

reacted to dhruv3006's post with 🚀 about 16 hours ago

Post

1187

Switching between API Client, browser, and API documentation tools to test and document APIs can harm your flow and leave your docs outdated.

This is what usually happens: While debugging an API in the middle of a sprint, the API Client says that everything's fine, but the docs still show an old version.

So you jump back to the code, find the updated response schema, then go back to the API Client, which gets stuck, forcing you to rerun the tests.

Hours can go by just trying to sync all this up (and that’s if you catch the inconsistencies at all).

The reason? Using disconnected tools for specs, tests, and docs. Doing manual updates, stale docs, and a lot of context switching.

Voiden takes a different approach: Puts specs, tests & docs all in one Markdown file, stored right in the repo.

Everything stays in sync, versioned with Git, and updated in one place, inside your editor.

Download Voiden here: https://voiden.md/download

reacted to melvindave's post with 🚀 about 16 hours ago

Post

1414

Currently having a blast learning the transformers library.

I noticed that model cards usually have Transformers code as usage examples.

So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.

Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.

Once I got the model loaded and sample inference working, I made an API to serve it.

I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!

Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090

Here's the result of my experimentation

6 replies

reacted to mitkox's post with 🚀 about 16 hours ago

Post

975

Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly.
Works out of the box with Mistral Vibe. Next is time to test the big one.